From ksi2443 at gmail.com Wed Feb 1 04:26:29 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Wed, 1 Feb 2023 19:26:29 +0900 Subject: [petsc-users] Question about handling matrix Message-ID: Hello, I want to put small matrix to large matrix. The schematic of operation is as below. [image: image.png] Is there any function for put small matrix to large matrix at once? Thanks, Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 19007 bytes Desc: not available URL: From mfadams at lbl.gov Wed Feb 1 06:51:45 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 1 Feb 2023 07:51:45 -0500 Subject: [petsc-users] Question about handling matrix In-Reply-To: References: Message-ID: Maybe create the large matrix and use https://www.mcs.anl.gov/petsc/petsc-3.7/docs/manualpages/Mat/MatGetSubMatrix.html with MAT_REUSE_MATRIX. Or pad the IS arguments to MatGetSubMatrix with -1, so the size is correct and PETSc should ignore -1, and use MAT_INITIAL_MATRIX. Others may know what will work. Good luck, Mark On Wed, Feb 1, 2023 at 5:26 AM ??? wrote: > Hello, > > > I want to put small matrix to large matrix. > The schematic of operation is as below. > [image: image.png] > Is there any function for put small matrix to large matrix at once? > > Thanks, > Hyung Kim > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 19007 bytes Desc: not available URL: From jed at jedbrown.org Wed Feb 1 08:25:28 2023 From: jed at jedbrown.org (Jed Brown) Date: Wed, 01 Feb 2023 07:25:28 -0700 Subject: [petsc-users] Question about handling matrix In-Reply-To: References: Message-ID: <87zg9x4g4n.fsf@jedbrown.org> Is the small matrix dense? Then you can use MatSetValues. If the small matrix is sparse, you can assemble it with larger dimension (empty rows and columns) and use MatAXPY. ??? writes: > Hello, > > > I want to put small matrix to large matrix. > The schematic of operation is as below. > [image: image.png] > Is there any function for put small matrix to large matrix at once? > > Thanks, > Hyung Kim From mi.mike1021 at gmail.com Wed Feb 1 13:58:57 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Wed, 1 Feb 2023 13:58:57 -0600 Subject: [petsc-users] PETSc Fortran 64-bit Message-ID: Hi all, I want to use PETSc with 64-bit indices with a Fortran90 code. It seems some PETSc functions have no problem, but some of the others do not accept 32-bit error code integer (e.g., "ierr" declared as PetscErrorCode type). For example, call DMPlexGetChart(dm, pst, pend, ierr);CHKERRA(ierr) works okay, but call DMPlexGetDepthStratum(dm, 0, vst, vend, ierr);CHKERRA(ierr) gives an error regarding the datatype of ierr. The error basically leads: Error: Type mismatch in argument ?b? at (1); passed INTEGER(4) to INTEGER(8) I tried to declare ierr as integer(kind=8) type, but there are still some problems. If PETSc is configured with 32-bit indices, the Fortran code works without problem. What surprising to me is that as mentioned above, DMPlexGetChart() works okay, but DMPlexGetDepthStratum() does not work with "ierr (PetscErrorCode type)" variable with 64-bit indices. Can I get any comments on it? Thanks, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Feb 1 14:11:14 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Feb 2023 14:11:14 -0600 (CST) Subject: [petsc-users] PETSc Fortran 64-bit In-Reply-To: References: Message-ID: <032174f5-051c-4d0c-0142-fba52eaab0b9@mcs.anl.gov> Check petsc examples You would use 'PetscErrorCode ierr' - not 'INTEGER(4) or INTEGER(8) Some routines probably don't have Interface definitions - hence compiler can't cat this issue with all functions. [but they all consistently need PetscErrorCode] Satish On Wed, 1 Feb 2023, Mike Michell wrote: > Hi all, > > I want to use PETSc with 64-bit indices with a Fortran90 code. It seems > some PETSc functions have no problem, but some of the others do not accept > 32-bit error code integer (e.g., "ierr" declared as PetscErrorCode type). > > For example, > > call DMPlexGetChart(dm, pst, pend, ierr);CHKERRA(ierr) works okay, > > but > > call DMPlexGetDepthStratum(dm, 0, vst, vend, ierr);CHKERRA(ierr) gives an > error regarding the datatype of ierr. The error basically leads: > Error: Type mismatch in argument ?b? at (1); passed INTEGER(4) to INTEGER(8) > > I tried to declare ierr as integer(kind=8) type, but there are still some > problems. If PETSc is configured with 32-bit indices, the Fortran code > works without problem. > > What surprising to me is that as mentioned above, DMPlexGetChart() works > okay, but DMPlexGetDepthStratum() does not work with "ierr (PetscErrorCode > type)" variable with 64-bit indices. > > Can I get any comments on it? > > Thanks, > Mike > From bourdin at mcmaster.ca Wed Feb 1 14:14:30 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Wed, 1 Feb 2023 20:14:30 +0000 Subject: [petsc-users] PETSc Fortran 64-bit In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Wed Feb 1 14:25:36 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Wed, 1 Feb 2023 14:25:36 -0600 Subject: [petsc-users] PETSc Fortran 64-bit In-Reply-To: References: Message-ID: @Satish and Blaise - Thank you for the notes. @Satish - When you say: "Some routines probably don't have Interface definitions - hence compiler can't cat this issue with all functions." Does it mean that I cannot use 64-bit option for those routines, which do not have the interface definitions? Thanks, Mike > I use the following I all my fortran codes (inspired by a post from > Michael Metcalf on comp.lang.fortran many many moons ago): > > PetscReal,Parameter :: PReal = 1.0 > Integer,Parameter,Public :: Kr = Selected_Real_Kind(Precision(PReal)) > PetscInt,Parameter :: PInt = 1 > Integer,Parameter,Public :: Ki = kind(PInt) > > You will need to pass constant with their kind (i.e. 1_Ki instead of 1) > > > The advantage of this approach over explicitly trying to figure out the > proper type of integer ois that the same code will automatically work with > 32 and 64 bit indices. > > I?ve been wondering if petsc should include these definitions (perhaps > under another name). > > Blaise > > > On Feb 1, 2023, at 2:58 PM, Mike Michell wrote: > > Hi all, > > I want to use PETSc with 64-bit indices with a Fortran90 code. It seems > some PETSc functions have no problem, but some of the others do not accept > 32-bit error code integer (e.g., "ierr" declared as PetscErrorCode type). > > For example, > > call DMPlexGetChart(dm, pst, pend, ierr);CHKERRA(ierr) works okay, > > but > > call DMPlexGetDepthStratum(dm, 0, vst, vend, ierr);CHKERRA(ierr) gives an > error regarding the datatype of ierr. The error basically leads: > Error: Type mismatch in argument ?b? at (1); passed INTEGER(4) to > INTEGER(8) > > I tried to declare ierr as integer(kind=8) type, but there are still some > problems. If PETSc is configured with 32-bit indices, the Fortran code > works without problem. > > What surprising to me is that as mentioned above, DMPlexGetChart() works > okay, but DMPlexGetDepthStratum() does not work with "ierr (PetscErrorCode > type)" variable with 64-bit indices. > > Can I get any comments on it? > > Thanks, > Mike > > > ? > Canada Research Chair in Mathematical and Computational Aspects of Solid > Mechanics (Tier 1) > Professor, Department of Mathematics & Statistics > Hamilton Hall room 409A, McMaster University > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Feb 1 14:40:23 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Feb 2023 14:40:23 -0600 (CST) Subject: [petsc-users] PETSc Fortran 64-bit In-Reply-To: References: Message-ID: <704e28f8-38ff-a0c1-40c6-1efd81ddaba9@mcs.anl.gov> On Wed, 1 Feb 2023, Mike Michell wrote: > @Satish and Blaise - Thank you for the notes. > > @Satish - When you say: "Some routines probably don't have Interface > definitions - hence compiler can't cat this issue with all functions." sorry - meant to say 'catch' [not cat] > Does it mean that I cannot use 64-bit option for those routines, which do > not have the interface definitions? All routines are usable. Its just that compiler treats routines without interface definitions as F77 - and it won't verify the data-types of arguments passed in. [i.e F77 mode..] But I do see interface defs for both DMPlexGetDepthStratum() and DMPlexGetChart() so don't know why they behave differently for you. src/dm/f90-mod/ftn-auto-interfaces/petscdmplex.h90 >>>>> subroutine DMPlexGetDepthStratum(a,b,c,d,z) import tDM DM a ! DM PetscInt b ! PetscInt PetscInt c ! PetscInt PetscInt d ! PetscInt PetscErrorCode z end subroutine DMPlexGetDepthStratum subroutine DMPlexGetChart(a,b,c,z) import tDM DM a ! DM PetscInt b ! PetscInt PetscInt c ! PetscInt PetscErrorCode z end subroutine DMPlexGetChart Satish > Thanks, > Mike > > > > I use the following I all my fortran codes (inspired by a post from > > Michael Metcalf on comp.lang.fortran many many moons ago): > > > > PetscReal,Parameter :: PReal = 1.0 > > Integer,Parameter,Public :: Kr = Selected_Real_Kind(Precision(PReal)) > > PetscInt,Parameter :: PInt = 1 > > Integer,Parameter,Public :: Ki = kind(PInt) > > > > You will need to pass constant with their kind (i.e. 1_Ki instead of 1) > > > > > > The advantage of this approach over explicitly trying to figure out the > > proper type of integer ois that the same code will automatically work with > > 32 and 64 bit indices. > > > > I?ve been wondering if petsc should include these definitions (perhaps > > under another name). > > > > Blaise > > > > > > On Feb 1, 2023, at 2:58 PM, Mike Michell wrote: > > > > Hi all, > > > > I want to use PETSc with 64-bit indices with a Fortran90 code. It seems > > some PETSc functions have no problem, but some of the others do not accept > > 32-bit error code integer (e.g., "ierr" declared as PetscErrorCode type). > > > > For example, > > > > call DMPlexGetChart(dm, pst, pend, ierr);CHKERRA(ierr) works okay, > > > > but > > > > call DMPlexGetDepthStratum(dm, 0, vst, vend, ierr);CHKERRA(ierr) gives an > > error regarding the datatype of ierr. The error basically leads: > > Error: Type mismatch in argument ?b? at (1); passed INTEGER(4) to > > INTEGER(8) > > > > I tried to declare ierr as integer(kind=8) type, but there are still some > > problems. If PETSc is configured with 32-bit indices, the Fortran code > > works without problem. > > > > What surprising to me is that as mentioned above, DMPlexGetChart() works > > okay, but DMPlexGetDepthStratum() does not work with "ierr (PetscErrorCode > > type)" variable with 64-bit indices. > > > > Can I get any comments on it? > > > > Thanks, > > Mike > > > > > > ? > > Canada Research Chair in Mathematical and Computational Aspects of Solid > > Mechanics (Tier 1) > > Professor, Department of Mathematics & Statistics > > Hamilton Hall room 409A, McMaster University > > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > > > > From balay at mcs.anl.gov Wed Feb 1 14:50:06 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Feb 2023 14:50:06 -0600 (CST) Subject: [petsc-users] PETSc Fortran 64-bit In-Reply-To: <704e28f8-38ff-a0c1-40c6-1efd81ddaba9@mcs.anl.gov> References: <704e28f8-38ff-a0c1-40c6-1efd81ddaba9@mcs.anl.gov> Message-ID: <5e80e0bf-ecf2-4e21-de57-ab3aae5e8f3d@mcs.anl.gov> > > call DMPlexGetDepthStratum(dm, 0, vst, vend, ierr);CHKERRA(ierr) gives an > > error regarding the datatype of ierr. The error basically leads: > > Error: Type mismatch in argument ?b? at (1); passed INTEGER(4) to > > INTEGER(8) Ok - I think the error here is with '0' - not ierr. Use: >>> PetscInt zero zero = 0 call DMPlexGetDepthStratum(dm, zero, vst, vend, ierr);CHKERRA(ierr) <<< Satish On Wed, 1 Feb 2023, Satish Balay via petsc-users wrote: > On Wed, 1 Feb 2023, Mike Michell wrote: > > > @Satish and Blaise - Thank you for the notes. > > > > @Satish - When you say: "Some routines probably don't have Interface > > definitions - hence compiler can't cat this issue with all functions." > > sorry - meant to say 'catch' [not cat] > > > Does it mean that I cannot use 64-bit option for those routines, which do > > not have the interface definitions? > > All routines are usable. Its just that compiler treats routines > without interface definitions as F77 - and it won't verify the > data-types of arguments passed in. [i.e F77 mode..] > > But I do see interface defs for both DMPlexGetDepthStratum() and > DMPlexGetChart() so don't know why they behave differently for you. > > src/dm/f90-mod/ftn-auto-interfaces/petscdmplex.h90 > > >>>>> > subroutine DMPlexGetDepthStratum(a,b,c,d,z) > import tDM > DM a ! DM > PetscInt b ! PetscInt > PetscInt c ! PetscInt > PetscInt d ! PetscInt > PetscErrorCode z > end subroutine DMPlexGetDepthStratum > > subroutine DMPlexGetChart(a,b,c,z) > import tDM > DM a ! DM > PetscInt b ! PetscInt > PetscInt c ! PetscInt > PetscErrorCode z > end subroutine DMPlexGetChart > > Satish > > > > Thanks, > > Mike > > > > > > > I use the following I all my fortran codes (inspired by a post from > > > Michael Metcalf on comp.lang.fortran many many moons ago): > > > > > > PetscReal,Parameter :: PReal = 1.0 > > > Integer,Parameter,Public :: Kr = Selected_Real_Kind(Precision(PReal)) > > > PetscInt,Parameter :: PInt = 1 > > > Integer,Parameter,Public :: Ki = kind(PInt) > > > > > > You will need to pass constant with their kind (i.e. 1_Ki instead of 1) > > > > > > > > > The advantage of this approach over explicitly trying to figure out the > > > proper type of integer ois that the same code will automatically work with > > > 32 and 64 bit indices. > > > > > > I?ve been wondering if petsc should include these definitions (perhaps > > > under another name). > > > > > > Blaise > > > > > > > > > On Feb 1, 2023, at 2:58 PM, Mike Michell wrote: > > > > > > Hi all, > > > > > > I want to use PETSc with 64-bit indices with a Fortran90 code. It seems > > > some PETSc functions have no problem, but some of the others do not accept > > > 32-bit error code integer (e.g., "ierr" declared as PetscErrorCode type). > > > > > > For example, > > > > > > call DMPlexGetChart(dm, pst, pend, ierr);CHKERRA(ierr) works okay, > > > > > > but > > > > > > call DMPlexGetDepthStratum(dm, 0, vst, vend, ierr);CHKERRA(ierr) gives an > > > error regarding the datatype of ierr. The error basically leads: > > > Error: Type mismatch in argument ?b? at (1); passed INTEGER(4) to > > > INTEGER(8) > > > > > > I tried to declare ierr as integer(kind=8) type, but there are still some > > > problems. If PETSc is configured with 32-bit indices, the Fortran code > > > works without problem. > > > > > > What surprising to me is that as mentioned above, DMPlexGetChart() works > > > okay, but DMPlexGetDepthStratum() does not work with "ierr (PetscErrorCode > > > type)" variable with 64-bit indices. > > > > > > Can I get any comments on it? > > > > > > Thanks, > > > Mike > > > > > > > > > ? > > > Canada Research Chair in Mathematical and Computational Aspects of Solid > > > Mechanics (Tier 1) > > > Professor, Department of Mathematics & Statistics > > > Hamilton Hall room 409A, McMaster University > > > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > > > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > > > > > > > > From bourdin at mcmaster.ca Wed Feb 1 15:08:55 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Wed, 1 Feb 2023 21:08:55 +0000 Subject: [petsc-users] PETSc Fortran 64-bit In-Reply-To: <5e80e0bf-ecf2-4e21-de57-ab3aae5e8f3d@mcs.anl.gov> References: <704e28f8-38ff-a0c1-40c6-1efd81ddaba9@mcs.anl.gov> <5e80e0bf-ecf2-4e21-de57-ab3aae5e8f3d@mcs.anl.gov> Message-ID: An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Wed Feb 1 22:24:15 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Thu, 2 Feb 2023 13:24:15 +0900 Subject: [petsc-users] Question about handling matrix In-Reply-To: <87zg9x4g4n.fsf@jedbrown.org> References: <87zg9x4g4n.fsf@jedbrown.org> Message-ID: Both small and large matrix are sparse matrix. I will try following comments. Thanks, Hyung Kim 2023? 2? 1? (?) ?? 11:30, Jed Brown ?? ??: > Is the small matrix dense? Then you can use MatSetValues. If the small > matrix is sparse, you can assemble it with larger dimension (empty rows and > columns) and use MatAXPY. > > ??? writes: > > > Hello, > > > > > > I want to put small matrix to large matrix. > > The schematic of operation is as below. > > [image: image.png] > > Is there any function for put small matrix to large matrix at once? > > > > Thanks, > > Hyung Kim > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Thu Feb 2 08:26:42 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (=?utf-8?q?Grosse-Bley=2C_Paul?=) Date: Thu, 02 Feb 2023 15:26:42 +0100 Subject: [petsc-users] PCMG Interpolation/Restriction Defaults Message-ID: <3afef7-63dbc800-5f-49f82e80@238760553> The documentation says that > The multigrid routines, which determine the solvers and interpolation/restriction operators that are used, are mandatory. But using code that doesn't contain any multigrid-specific routines e.g. ex45 and running it with -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_type multiplicative -pc_mg_cycle_type v -pc_mg_levels 5 ... seems to work fine and also not fall-back to algebraic multigrid/PCGAMG from what I see with -log_view. What interpolation/restriction operators are used by default with PCMG? Is it cleverly using the geometric information from DA? Thanks, Paul ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Feb 2 09:38:40 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 2 Feb 2023 10:38:40 -0500 Subject: [petsc-users] PCMG Interpolation/Restriction Defaults In-Reply-To: <3afef7-63dbc800-5f-49f82e80@238760553> References: <3afef7-63dbc800-5f-49f82e80@238760553> Message-ID: <0A5914C3-B4F3-4541-B05F-0EDD4E002B28@petsc.dev> Yes, it pulls the information from the DM. See src/ksp/pc/impls/mg/mg.c the function PCSetUp_MG. Note this function has become horribly complex with people adding all kinds of crazy ways to set things up, and the code should be refactored to remove the complexity but anyways you'll see things like DMCoarsen(), DMCreateInterpolation() etc used to create what the MG algorithm needs. Barry > On Feb 2, 2023, at 9:26 AM, Grosse-Bley, Paul wrote: > > The documentation says that > > > The multigrid routines, which determine the solvers and interpolation/restriction operators that are used, are mandatory. > > But using code that doesn't contain any multigrid-specific routines e.g. ex45 and running it with > > -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 > -pc_type mg > -pc_mg_type multiplicative > -pc_mg_cycle_type v > -pc_mg_levels 5 > ... > > seems to work fine and also not fall-back to algebraic multigrid/PCGAMG from what I see with -log_view. > > What interpolation/restriction operators are used by default with PCMG? > Is it cleverly using the geometric information from DA? > > Thanks, > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Thu Feb 2 10:09:55 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Thu, 02 Feb 2023 17:09:55 +0100 Subject: [petsc-users] =?utf-8?q?PCMG_Interpolation/Restriction_Defaults?= In-Reply-To: <0A5914C3-B4F3-4541-B05F-0EDD4E002B28@petsc.dev> Message-ID: <3aff6e-63dbe080-97-1ce8b800@44098745> That is great to know! When I read that part of the docs, I thought it would be a big hassle to use geometric multigrid, having to manually define all those matrices. Thanks again! On Thursday, February 02, 2023 16:38 CET, Barry Smith wrote: ???? Yes, it pulls the information from the DM. See src/ksp/pc/impls/mg/mg.c the function?PCSetUp_MG. ?Note this function has become horribly complex with people adding all kinds of crazy ways to set things up, and the code should be refactored to remove the complexity but anyways you'll see things like?DMCoarsen(),?DMCreateInterpolation() etc used to create what the MG algorithm needs.?? Barry??On Feb 2, 2023, at 9:26 AM, Grosse-Bley, Paul wrote:?The documentation says that > The multigrid routines, which determine the solvers and interpolation/restriction operators that are used, are mandatory. But using code that doesn't contain any multigrid-specific routines e.g. ex45 and running it with -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_type multiplicative -pc_mg_cycle_type v -pc_mg_levels 5 ... seems to work fine and also not fall-back to algebraic multigrid/PCGAMG from what I see with -log_view. What interpolation/restriction operators are used by default with PCMG? Is it cleverly using the geometric information from DA? Thanks, Paul ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Fri Feb 3 03:05:35 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Fri, 3 Feb 2023 18:05:35 +0900 Subject: [petsc-users] Question about MatGetRow Message-ID: Hello, By using MatGetRow, user can get vectors from local matrix (at each process). However, I need other process's row values. So I have 2 questions. 1. Is there any function for getting arrays from other process's?? 2. Or is there any function like matrix version of vecscattertoall?? Thanks, Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Feb 3 06:03:19 2023 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 3 Feb 2023 07:03:19 -0500 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ Note, PETSc lets you give NULL arguments if there is a reasonable default. In this case give NULL for the column IS and you will get the whole columns. Mark On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: > Hello, > > > By using MatGetRow, user can get vectors from local matrix (at each > process). > > However, I need other process's row values. > So I have 2 questions. > > 1. Is there any function for getting arrays from other process's?? > > 2. Or is there any function like matrix version of vecscattertoall?? > > Thanks, > Hyung Kim > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Fri Feb 3 07:06:22 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Fri, 3 Feb 2023 22:06:22 +0900 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: Following your comments, I want to check below things. For example, the global dense matrix are as below. [image: image.png] If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. Then I should put 'MatCreateSubMatrix (mat, isrow , NULL, MAT_INITIAL_MATRIX, *&*newmat)' and isrow will be [0 1 2 3 4 5 6 7]. In this case, How can I make isrow? Actually I can't understand the procedure of handling isrow. Hyung Kim 2023? 2? 3? (?) ?? 9:03, Mark Adams ?? ??: > https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ > > Note, PETSc lets you give NULL arguments if there is a reasonable default. > In this case give NULL for the column IS and you will get the whole > columns. > > Mark > > On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: > >> Hello, >> >> >> By using MatGetRow, user can get vectors from local matrix (at each >> process). >> >> However, I need other process's row values. >> So I have 2 questions. >> >> 1. Is there any function for getting arrays from other process's?? >> >> 2. Or is there any function like matrix version of vecscattertoall?? >> >> Thanks, >> Hyung Kim >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6931 bytes Desc: not available URL: From knepley at gmail.com Fri Feb 3 07:26:36 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Feb 2023 08:26:36 -0500 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: On Fri, Feb 3, 2023 at 8:06 AM ??? wrote: > Following your comments, > I want to check below things. > For example, the global dense matrix are as below. > [image: image.png] > If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. > Then I should put 'MatCreateSubMatrix > (mat, > isrow, NULL, MAT_INITIAL_MATRIX, *&*newmat)' > and isrow will be [0 1 2 3 4 5 6 7]. > > In this case, How can I make isrow? > Actually I can't understand the procedure of handling isrow. > You create an IS object of type ISGENERAL and give it the array of global indices that you want to extract. Thanks, Matt > Hyung Kim > > 2023? 2? 3? (?) ?? 9:03, Mark Adams ?? ??: > >> https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ >> >> Note, PETSc lets you give NULL arguments if there is a reasonable default. >> In this case give NULL for the column IS and you will get the whole >> columns. >> >> Mark >> >> On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: >> >>> Hello, >>> >>> >>> By using MatGetRow, user can get vectors from local matrix (at each >>> process). >>> >>> However, I need other process's row values. >>> So I have 2 questions. >>> >>> 1. Is there any function for getting arrays from other process's?? >>> >>> 2. Or is there any function like matrix version of vecscattertoall?? >>> >>> Thanks, >>> Hyung Kim >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6931 bytes Desc: not available URL: From ksi2443 at gmail.com Fri Feb 3 07:45:09 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Fri, 3 Feb 2023 22:45:09 +0900 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: Following your comments, If I extract first row of below matrix. [image: image.png] IS isrow; PetscInt *indices; PetscMalloc1(1, *indices); Indices[0] = 0; ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_COPY_VALUES, &isrow); MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); Then can I get the array about first row of global matrix in all processes? Hyung Kim 2023? 2? 3? (?) ?? 10:26, Matthew Knepley ?? ??: > On Fri, Feb 3, 2023 at 8:06 AM ??? wrote: > >> Following your comments, >> I want to check below things. >> For example, the global dense matrix are as below. >> [image: image.png] >> If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. >> Then I should put 'MatCreateSubMatrix >> (mat, >> isrow, NULL, MAT_INITIAL_MATRIX, *&*newmat)' >> and isrow will be [0 1 2 3 4 5 6 7]. >> >> In this case, How can I make isrow? >> Actually I can't understand the procedure of handling isrow. >> > > You create an IS object of type ISGENERAL and give it the array of global > indices that you want to extract. > > Thanks, > > Matt > > >> Hyung Kim >> >> 2023? 2? 3? (?) ?? 9:03, Mark Adams ?? ??: >> >>> https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ >>> >>> Note, PETSc lets you give NULL arguments if there is a reasonable >>> default. >>> In this case give NULL for the column IS and you will get the whole >>> columns. >>> >>> Mark >>> >>> On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: >>> >>>> Hello, >>>> >>>> >>>> By using MatGetRow, user can get vectors from local matrix (at each >>>> process). >>>> >>>> However, I need other process's row values. >>>> So I have 2 questions. >>>> >>>> 1. Is there any function for getting arrays from other process's?? >>>> >>>> 2. Or is there any function like matrix version of vecscattertoall?? >>>> >>>> Thanks, >>>> Hyung Kim >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6931 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7950 bytes Desc: not available URL: From ksi2443 at gmail.com Fri Feb 3 07:52:25 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Fri, 3 Feb 2023 22:52:25 +0900 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: I want to extract same row values of global matrix in all processes. Then how can I do this?? The case of same problem of vector, I just use vecscattertoall. However, I can't find same function for matrix. Hyung Kim 2023? 2? 3? (?) ?? 10:47, Matthew Knepley ?? ??: > On Fri, Feb 3, 2023 at 8:45 AM ??? wrote: > >> Following your comments, >> If I extract first row of below matrix. >> [image: image.png] >> IS isrow; >> PetscInt *indices; >> PetscMalloc1(1, *indices); >> > > That should be &indices. > > >> Indices[0] = 0; >> ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_COPY_VALUES, &isrow); >> > > You should use PETSC_OWN_POINTER. > > >> MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); >> >> Then can I get the array about first row of global matrix in all >> processes? >> > > No, just on the process which gives 0. If you do that on every process, > every rank with get row 0. > > Thanks, > > Matt > > >> Hyung Kim >> >> 2023? 2? 3? (?) ?? 10:26, Matthew Knepley ?? ??: >> >>> On Fri, Feb 3, 2023 at 8:06 AM ??? wrote: >>> >>>> Following your comments, >>>> I want to check below things. >>>> For example, the global dense matrix are as below. >>>> [image: image.png] >>>> If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. >>>> Then I should put 'MatCreateSubMatrix >>>> (mat, >>>> isrow, NULL, MAT_INITIAL_MATRIX, *&*newmat)' >>>> and isrow will be [0 1 2 3 4 5 6 7]. >>>> >>>> In this case, How can I make isrow? >>>> Actually I can't understand the procedure of handling isrow. >>>> >>> >>> You create an IS object of type ISGENERAL and give it the array of >>> global indices that you want to extract. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Hyung Kim >>>> >>>> 2023? 2? 3? (?) ?? 9:03, Mark Adams ?? ??: >>>> >>>>> https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ >>>>> >>>>> Note, PETSc lets you give NULL arguments if there is a reasonable >>>>> default. >>>>> In this case give NULL for the column IS and you will get the whole >>>>> columns. >>>>> >>>>> Mark >>>>> >>>>> On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> >>>>>> By using MatGetRow, user can get vectors from local matrix (at each >>>>>> process). >>>>>> >>>>>> However, I need other process's row values. >>>>>> So I have 2 questions. >>>>>> >>>>>> 1. Is there any function for getting arrays from other process's?? >>>>>> >>>>>> 2. Or is there any function like matrix version of vecscattertoall?? >>>>>> >>>>>> Thanks, >>>>>> Hyung Kim >>>>>> >>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6931 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7950 bytes Desc: not available URL: From knepley at gmail.com Fri Feb 3 07:54:17 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Feb 2023 08:54:17 -0500 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: On Fri, Feb 3, 2023 at 8:52 AM ??? wrote: > I want to extract same row values of global matrix in all processes. > Then how can I do this?? > Create the same IS on each process. THanks, Matt > The case of same problem of vector, I just use vecscattertoall. > However, I can't find same function for matrix. > > Hyung Kim > > 2023? 2? 3? (?) ?? 10:47, Matthew Knepley ?? ??: > >> On Fri, Feb 3, 2023 at 8:45 AM ??? wrote: >> >>> Following your comments, >>> If I extract first row of below matrix. >>> [image: image.png] >>> IS isrow; >>> PetscInt *indices; >>> PetscMalloc1(1, *indices); >>> >> >> That should be &indices. >> >> >>> Indices[0] = 0; >>> ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_COPY_VALUES, >>> &isrow); >>> >> >> You should use PETSC_OWN_POINTER. >> >> >>> MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); >>> >>> Then can I get the array about first row of global matrix in all >>> processes? >>> >> >> No, just on the process which gives 0. If you do that on every process, >> every rank with get row 0. >> >> Thanks, >> >> Matt >> >> >>> Hyung Kim >>> >>> 2023? 2? 3? (?) ?? 10:26, Matthew Knepley ?? ??: >>> >>>> On Fri, Feb 3, 2023 at 8:06 AM ??? wrote: >>>> >>>>> Following your comments, >>>>> I want to check below things. >>>>> For example, the global dense matrix are as below. >>>>> [image: image.png] >>>>> If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. >>>>> Then I should put 'MatCreateSubMatrix >>>>> (mat, >>>>> isrow, NULL, MAT_INITIAL_MATRIX, *&*newmat)' >>>>> and isrow will be [0 1 2 3 4 5 6 7]. >>>>> >>>>> In this case, How can I make isrow? >>>>> Actually I can't understand the procedure of handling isrow. >>>>> >>>> >>>> You create an IS object of type ISGENERAL and give it the array of >>>> global indices that you want to extract. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Hyung Kim >>>>> >>>>> 2023? 2? 3? (?) ?? 9:03, Mark Adams ?? ??: >>>>> >>>>>> https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ >>>>>> >>>>>> Note, PETSc lets you give NULL arguments if there is a reasonable >>>>>> default. >>>>>> In this case give NULL for the column IS and you will get the whole >>>>>> columns. >>>>>> >>>>>> Mark >>>>>> >>>>>> On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> >>>>>>> By using MatGetRow, user can get vectors from local matrix (at each >>>>>>> process). >>>>>>> >>>>>>> However, I need other process's row values. >>>>>>> So I have 2 questions. >>>>>>> >>>>>>> 1. Is there any function for getting arrays from other process's?? >>>>>>> >>>>>>> 2. Or is there any function like matrix version of vecscattertoall?? >>>>>>> >>>>>>> Thanks, >>>>>>> Hyung Kim >>>>>>> >>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6931 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7950 bytes Desc: not available URL: From ksi2443 at gmail.com Fri Feb 3 08:04:04 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Fri, 3 Feb 2023 23:04:04 +0900 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: Actually in the last mail, below scripts are running in all processes. IS isrow; PetscInt *indices; PetscMalloc1(1, &indices); Indices[0] = 0; ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_OWN_POINTER, &isrow); MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); (extract from newmat) However, you said it cannot get the values of first row of global matrix. Please let me know how can I fix this scripts for getting the 1st row of global matrix in all processes. Hyung Kim 2023? 2? 3? (?) ?? 10:54, Matthew Knepley ?? ??: > On Fri, Feb 3, 2023 at 8:52 AM ??? wrote: > >> I want to extract same row values of global matrix in all processes. >> Then how can I do this?? >> > > Create the same IS on each process. > > THanks, > > Matt > > >> The case of same problem of vector, I just use vecscattertoall. >> However, I can't find same function for matrix. >> >> Hyung Kim >> >> 2023? 2? 3? (?) ?? 10:47, Matthew Knepley ?? ??: >> >>> On Fri, Feb 3, 2023 at 8:45 AM ??? wrote: >>> >>>> Following your comments, >>>> If I extract first row of below matrix. >>>> [image: image.png] >>>> IS isrow; >>>> PetscInt *indices; >>>> PetscMalloc1(1, *indices); >>>> >>> >>> That should be &indices. >>> >>> >>>> Indices[0] = 0; >>>> ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_COPY_VALUES, >>>> &isrow); >>>> >>> >>> You should use PETSC_OWN_POINTER. >>> >>> >>>> MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); >>>> >>>> Then can I get the array about first row of global matrix in all >>>> processes? >>>> >>> >>> No, just on the process which gives 0. If you do that on every process, >>> every rank with get row 0. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Hyung Kim >>>> >>>> 2023? 2? 3? (?) ?? 10:26, Matthew Knepley ?? ??: >>>> >>>>> On Fri, Feb 3, 2023 at 8:06 AM ??? wrote: >>>>> >>>>>> Following your comments, >>>>>> I want to check below things. >>>>>> For example, the global dense matrix are as below. >>>>>> [image: image.png] >>>>>> If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. >>>>>> Then I should put 'MatCreateSubMatrix >>>>>> (mat >>>>>> , isrow, NULL, MAT_INITIAL_MATRIX, *&*newmat)' >>>>>> and isrow will be [0 1 2 3 4 5 6 7]. >>>>>> >>>>>> In this case, How can I make isrow? >>>>>> Actually I can't understand the procedure of handling isrow. >>>>>> >>>>> >>>>> You create an IS object of type ISGENERAL and give it the array of >>>>> global indices that you want to extract. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Hyung Kim >>>>>> >>>>>> 2023? 2? 3? (?) ?? 9:03, Mark Adams ?? ??: >>>>>> >>>>>>> https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ >>>>>>> >>>>>>> Note, PETSc lets you give NULL arguments if there is a reasonable >>>>>>> default. >>>>>>> In this case give NULL for the column IS and you will get the whole >>>>>>> columns. >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> >>>>>>>> By using MatGetRow, user can get vectors from local matrix (at each >>>>>>>> process). >>>>>>>> >>>>>>>> However, I need other process's row values. >>>>>>>> So I have 2 questions. >>>>>>>> >>>>>>>> 1. Is there any function for getting arrays from other process's?? >>>>>>>> >>>>>>>> 2. Or is there any function like matrix version of vecscattertoall?? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Hyung Kim >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6931 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7950 bytes Desc: not available URL: From knepley at gmail.com Fri Feb 3 08:32:49 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Feb 2023 09:32:49 -0500 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: On Fri, Feb 3, 2023 at 9:04 AM ??? wrote: > Actually in the last mail, below scripts are running in all processes. > > IS isrow; > PetscInt *indices; > PetscMalloc1(1, &indices); > Indices[0] = 0; > ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_OWN_POINTER, &isrow); > MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); > (extract from newmat) > > However, you said it cannot get the values of first row of global matrix. > Please let me know how can I fix this scripts for getting the 1st row of > global matrix in all processes. > Did you run and see what you get? If it is on all processes, it should work. Thanks, Matt > Hyung Kim > > > > > > > > 2023? 2? 3? (?) ?? 10:54, Matthew Knepley ?? ??: > >> On Fri, Feb 3, 2023 at 8:52 AM ??? wrote: >> >>> I want to extract same row values of global matrix in all processes. >>> Then how can I do this?? >>> >> >> Create the same IS on each process. >> >> THanks, >> >> Matt >> >> >>> The case of same problem of vector, I just use vecscattertoall. >>> However, I can't find same function for matrix. >>> >>> Hyung Kim >>> >>> 2023? 2? 3? (?) ?? 10:47, Matthew Knepley ?? ??: >>> >>>> On Fri, Feb 3, 2023 at 8:45 AM ??? wrote: >>>> >>>>> Following your comments, >>>>> If I extract first row of below matrix. >>>>> [image: image.png] >>>>> IS isrow; >>>>> PetscInt *indices; >>>>> PetscMalloc1(1, *indices); >>>>> >>>> >>>> That should be &indices. >>>> >>>> >>>>> Indices[0] = 0; >>>>> ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_COPY_VALUES, >>>>> &isrow); >>>>> >>>> >>>> You should use PETSC_OWN_POINTER. >>>> >>>> >>>>> MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); >>>>> >>>>> Then can I get the array about first row of global matrix in all >>>>> processes? >>>>> >>>> >>>> No, just on the process which gives 0. If you do that on every process, >>>> every rank with get row 0. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Hyung Kim >>>>> >>>>> 2023? 2? 3? (?) ?? 10:26, Matthew Knepley ?? ??: >>>>> >>>>>> On Fri, Feb 3, 2023 at 8:06 AM ??? wrote: >>>>>> >>>>>>> Following your comments, >>>>>>> I want to check below things. >>>>>>> For example, the global dense matrix are as below. >>>>>>> [image: image.png] >>>>>>> If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. >>>>>>> Then I should put 'MatCreateSubMatrix >>>>>>> ( >>>>>>> mat, isrow, NULL, MAT_INITIAL_MATRIX, *&*newmat)' >>>>>>> and isrow will be [0 1 2 3 4 5 6 7]. >>>>>>> >>>>>>> In this case, How can I make isrow? >>>>>>> Actually I can't understand the procedure of handling isrow. >>>>>>> >>>>>> >>>>>> You create an IS object of type ISGENERAL and give it the array of >>>>>> global indices that you want to extract. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Hyung Kim >>>>>>> >>>>>>> 2023? 2? 3? (?) ?? 9:03, Mark Adams ?? ??: >>>>>>> >>>>>>>> https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ >>>>>>>> >>>>>>>> Note, PETSc lets you give NULL arguments if there is a reasonable >>>>>>>> default. >>>>>>>> In this case give NULL for the column IS and you will get the whole >>>>>>>> columns. >>>>>>>> >>>>>>>> Mark >>>>>>>> >>>>>>>> On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> >>>>>>>>> By using MatGetRow, user can get vectors from local matrix (at >>>>>>>>> each process). >>>>>>>>> >>>>>>>>> However, I need other process's row values. >>>>>>>>> So I have 2 questions. >>>>>>>>> >>>>>>>>> 1. Is there any function for getting arrays from other process's?? >>>>>>>>> >>>>>>>>> 2. Or is there any function like matrix version of >>>>>>>>> vecscattertoall?? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Hyung Kim >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6931 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7950 bytes Desc: not available URL: From mi.mike1021 at gmail.com Fri Feb 3 08:48:52 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Fri, 3 Feb 2023 08:48:52 -0600 Subject: [petsc-users] PETSc Fortran 64-bit In-Reply-To: <5e80e0bf-ecf2-4e21-de57-ab3aae5e8f3d@mcs.anl.gov> References: <704e28f8-38ff-a0c1-40c6-1efd81ddaba9@mcs.anl.gov> <5e80e0bf-ecf2-4e21-de57-ab3aae5e8f3d@mcs.anl.gov> Message-ID: That was the real problem indeed! Thanks a lot for the help. Mike > > > call DMPlexGetDepthStratum(dm, 0, vst, vend, ierr);CHKERRA(ierr) > gives an > > > error regarding the datatype of ierr. The error basically leads: > > > Error: Type mismatch in argument ?b? at (1); passed INTEGER(4) to > > > INTEGER(8) > > Ok - I think the error here is with '0' - not ierr. > > Use: > > >>> > PetscInt zero > zero = 0 > call DMPlexGetDepthStratum(dm, zero, vst, vend, ierr);CHKERRA(ierr) > <<< > > Satish > > On Wed, 1 Feb 2023, Satish Balay via petsc-users wrote: > > > On Wed, 1 Feb 2023, Mike Michell wrote: > > > > > @Satish and Blaise - Thank you for the notes. > > > > > > @Satish - When you say: "Some routines probably don't have Interface > > > definitions - hence compiler can't cat this issue with all functions." > > > > sorry - meant to say 'catch' [not cat] > > > > > Does it mean that I cannot use 64-bit option for those routines, which > do > > > not have the interface definitions? > > > > All routines are usable. Its just that compiler treats routines > > without interface definitions as F77 - and it won't verify the > > data-types of arguments passed in. [i.e F77 mode..] > > > > But I do see interface defs for both DMPlexGetDepthStratum() and > > DMPlexGetChart() so don't know why they behave differently for you. > > > > src/dm/f90-mod/ftn-auto-interfaces/petscdmplex.h90 > > > > >>>>> > > subroutine DMPlexGetDepthStratum(a,b,c,d,z) > > import tDM > > DM a ! DM > > PetscInt b ! PetscInt > > PetscInt c ! PetscInt > > PetscInt d ! PetscInt > > PetscErrorCode z > > end subroutine DMPlexGetDepthStratum > > > > subroutine DMPlexGetChart(a,b,c,z) > > import tDM > > DM a ! DM > > PetscInt b ! PetscInt > > PetscInt c ! PetscInt > > PetscErrorCode z > > end subroutine DMPlexGetChart > > > > Satish > > > > > > > Thanks, > > > Mike > > > > > > > > > > I use the following I all my fortran codes (inspired by a post from > > > > Michael Metcalf on comp.lang.fortran many many moons ago): > > > > > > > > PetscReal,Parameter :: PReal = 1.0 > > > > Integer,Parameter,Public :: Kr = Selected_Real_Kind(Precision(PReal)) > > > > PetscInt,Parameter :: PInt = 1 > > > > Integer,Parameter,Public :: Ki = kind(PInt) > > > > > > > > You will need to pass constant with their kind (i.e. 1_Ki instead of > 1) > > > > > > > > > > > > The advantage of this approach over explicitly trying to figure out > the > > > > proper type of integer ois that the same code will automatically > work with > > > > 32 and 64 bit indices. > > > > > > > > I?ve been wondering if petsc should include these definitions > (perhaps > > > > under another name). > > > > > > > > Blaise > > > > > > > > > > > > On Feb 1, 2023, at 2:58 PM, Mike Michell > wrote: > > > > > > > > Hi all, > > > > > > > > I want to use PETSc with 64-bit indices with a Fortran90 code. It > seems > > > > some PETSc functions have no problem, but some of the others do not > accept > > > > 32-bit error code integer (e.g., "ierr" declared as PetscErrorCode > type). > > > > > > > > For example, > > > > > > > > call DMPlexGetChart(dm, pst, pend, ierr);CHKERRA(ierr) works okay, > > > > > > > > but > > > > > > > > call DMPlexGetDepthStratum(dm, 0, vst, vend, ierr);CHKERRA(ierr) > gives an > > > > error regarding the datatype of ierr. The error basically leads: > > > > Error: Type mismatch in argument ?b? at (1); passed INTEGER(4) to > > > > INTEGER(8) > > > > > > > > I tried to declare ierr as integer(kind=8) type, but there are still > some > > > > problems. If PETSc is configured with 32-bit indices, the Fortran > code > > > > works without problem. > > > > > > > > What surprising to me is that as mentioned above, DMPlexGetChart() > works > > > > okay, but DMPlexGetDepthStratum() does not work with "ierr > (PetscErrorCode > > > > type)" variable with 64-bit indices. > > > > > > > > Can I get any comments on it? > > > > > > > > Thanks, > > > > Mike > > > > > > > > > > > > ? > > > > Canada Research Chair in Mathematical and Computational Aspects of > Solid > > > > Mechanics (Tier 1) > > > > Professor, Department of Mathematics & Statistics > > > > Hamilton Hall room 409A, McMaster University > > > > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada > > > > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Fri Feb 3 10:41:05 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Fri, 3 Feb 2023 16:41:05 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: Message-ID: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: petsc_redistribute.txt URL: From bsmith at petsc.dev Fri Feb 3 11:40:44 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 3 Feb 2023 12:40:44 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: Message-ID: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> We need all the error output for the errors you got below to understand why the errors are happening. > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > I cannot explain this > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Fri Feb 3 12:04:44 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Fri, 3 Feb 2023 18:04:44 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> Message-ID: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 3 12:35:28 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Feb 2023 13:35:28 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> Message-ID: On Fri, Feb 3, 2023 at 1:05 PM Karthikeyan Chockalingam - STFC UKRI via petsc-users wrote: > Thank you. The entire error output was an attachment in my previous email. > I am pasting here for your reference. > The options "-options_left" does not take effect until PetscFinalize(), but your program crashed before that ran, so it comes up as not used in the error message, but then it runs and it is used. The main problem is that you have not preallocated the matrix to match your input. Thanks, Matt > > > [1;31m[0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > > [0]PETSC ERROR: Option left: name:-options_left (no value) > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda > GIT Date: 2022-10-30 11:03:25 -0500 > > [0]PETSC ERROR: > /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named > HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > > [0]PETSC ERROR: Configure options --with-debugging=0 > --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc > --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes > --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > > [0]PETSC ERROR: #2 MatZeroRowsColumns() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > > [0]PETSC ERROR: #4 MatZeroRowsColumns() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > [0]PETSC ERROR: #6 localAssembly() at > /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > > Residual norms for redistribute_ solve. > > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm > 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm > 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm > 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm > 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm > 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm > 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm > 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm > 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm > 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm > 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm > 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm > 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > > KSP Object: (redistribute_) 1 MPI process > > type: cg > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (redistribute_) 1 MPI process > > type: jacobi > > type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > type: mpiaij > > rows=48896, cols=48896 > > total: nonzeros=307976, allocated nonzeros=307976 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node (on process 0) routines > > End of program > > solve time 0.564714744 seconds > > Starting max value is: 0 > > Min value of level 0 is: 0 > > Interpolated min value is: 741.978761 > > Unused ParmParse Variables: > > [TOP]::model.type(nvals = 1) :: [3] > > [TOP]::ref_ratio(nvals = 1) :: [2] > > > > AMReX (22.10-20-g3082028e4287) finalized > > #PETSc Option Table entries: > > -ksp_type preonly > > -options_left > > -pc_type redistribute > > -redistribute_ksp_converged_reason > > -redistribute_ksp_monitor_true_residual > > -redistribute_ksp_type cg > > -redistribute_ksp_view > > -redistribute_pc_type jacobi > > #End of PETSc Option Table entries > > There are no unused options. > > Program ended with exit code: 0 > > > > > > Best, > > Karthik. > > > > *From: *Barry Smith > *Date: *Friday, 3 February 2023 at 17:41 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > We need all the error output for the errors you got below to understand > why the errors are happening. > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Hello Barry, > > > > I would like to better understand pc_type redistribute usage. > > > > I am plan to use pc_type *redistribute* in the context of adaptive mesh > refinement on a structured grid in 2D. My base mesh (level 0) is indexed > from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to > 4(N-1) elements. When I construct system matrix A on (level 1); I probably > only use 20% of 4(N-1) elements, however the indexes are scattered in the > range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system > matrix A on (level 1) to be zero. From your earlier response, I believe > this would be a use case for petsc_type redistribute. > > > > Indeed the linear solve will be more efficient if you use the > redistribute solver. > > > > But I don't understand your plan. With adaptive refinement I would just > create the two matrices, one for the initial grid on which you solve the > system, this will be a smaller matrix and then create a new larger matrix > for the refined grid (and discard the previous matrix). > > > > > > Question (1) > > > > > > If N is really large, I would have to allocate memory of size 4(N-1) for > the system matrix A on (level 1). How does pc_type redistribute help? > Because, I did end up allocating memory for a large system, where most of > the rows and columns are zeros. Is most of the allotted memory not wasted? > *Is this the correct usage?* > > > > See above > > > > > > Question (2) > > > > > > I tried using pc_type redistribute for a two level system. > > I have *attached* the output only for (level 1) > > The solution converges to right solution but still petsc outputs some > error messages. > > > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > > [0]PETSC ERROR: Option left: name:-options_left (no value) > > > > But the there were no unused options > > > > #PETSc Option Table entries: > > -ksp_type preonly > > -options_left > > -pc_type redistribute > > -redistribute_ksp_converged_reason > > -redistribute_ksp_monitor_true_residual > > -redistribute_ksp_type cg > > -redistribute_ksp_view > > -redistribute_pc_type jacobi > > #End of PETSc Option Table entries > > *There are no unused options.* > > Program ended with exit code: 0 > > > > I cannot explain this > > > Question (2) > > > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > > > What does this error message imply? Given I only use 20% of 4(N-1) > indexes, I can imagine most of the diagonal entrees are zero. *Is my > understanding correct?* > > > > > > Question (3) > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. > I don?t follow why I get this error message as the linear system converges > to the right solution. > > > > Thank you for your help. > > > > Kind regards, > > Karthik. > > > > > > > > *From: *Barry Smith > *Date: *Tuesday, 10 January 2023 at 18:50 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > Yes, after the solve the x will contain correct values for ALL the > locations including the (zeroed out rows). You use case is exactly what > redistribute it for. > > > > Barry > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you Barry. This is great! > > > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet > bc using > > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > > > While I retrieve the solution data from x (after the solve) ? can I index > them using the original ordering (if I may say that)? > > > > Kind regards, > > Karthik. > > > > *From: *Barry Smith > *Date: *Tuesday, 10 January 2023 at 16:04 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute > -pc_type redistribute > > > > > > It does everything for you. Note that if the right hand side for any of > the "zero" rows is nonzero then the system is inconsistent and the system > does not have a solution. > > > > Barry > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via > petsc-users wrote: > > > > Hello, > > > > I am assembling a MATIJ of size N, where a very large number of rows (and > corresponding columns), are zeros. I would like to potentially eliminate > them before the solve. > > > > For instance say N=7 > > > > 0 0 0 0 0 0 0 > > 0 1 -1 0 0 0 0 > > 0 -1 2 0 0 0 -1 > > 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 > > 0 0 -1 0 0 0 1 > > > > I would like to reduce it to a 3x3 > > > > 1 -1 0 > > -1 2 -1 > > 0 -1 1 > > > > I do know the size N. > > > > Q1) How do I do it? > > Q2) Is it better to eliminate them as it would save a lot of memory? > > Q3) At the moment, I don?t know which rows (and columns) have the zero > entries but with some effort I probably can find them. Should I know which > rows (and columns) I am eliminating? > > > > Thank you. > > > > Karthik. > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not use, > disclose, copy or distribute this email or any of its attachments and > should notify the sender immediately and delete this email from your > system. UK Research and Innovation (UKRI) has taken every reasonable > precaution to minimise risk of this email or any attachments containing > viruses or malware but the recipient should carry out its own virus and > malware checks before opening the attachments. UKRI does not accept any > liability for any losses or damages which the recipient may sustain due to > presence of any viruses. > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Feb 3 12:51:12 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 3 Feb 2023 13:51:12 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> Message-ID: <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbarnafi at cmm.uchile.cl Fri Feb 3 13:45:58 2023 From: nbarnafi at cmm.uchile.cl (Nicolas Barnafi) Date: Fri, 3 Feb 2023 16:45:58 -0300 Subject: [petsc-users] Problem setting Fieldsplit fields Message-ID: Dear community, I am using a fieldsplit preconditioner, but for some reason I cannot group fields as in other libraries (i.e. I do this in Firedrake and it works). Some context: I have set four fields to the preconditioner, which I want to regroup with -pc_fieldsplit_0_fields value: 0,1 -pc_fieldsplit_1_fields value: 2,3 But apparently this doesn't get read for some reason. In fact, from -ksp_view I still see all 4 fields, and it fails as one of the blocks has a null diagonal (coming from a saddle point problem), so it gives the following error. Interestingly, it shows that the groupings where not considered: [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry 0 [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-pc_fieldsplit_0_fields value: 0,1 [0]PETSC ERROR: Option left: name:-pc_fieldsplit_1_fields value: 2,3 Any clues to why this can happen? Best regards, Nicol?s Barnafi From knepley at gmail.com Fri Feb 3 13:50:31 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Feb 2023 14:50:31 -0500 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: Message-ID: On Fri, Feb 3, 2023 at 2:46 PM Nicolas Barnafi via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear community, > > I am using a fieldsplit preconditioner, but for some reason I cannot > group fields as in other libraries (i.e. I do this in Firedrake and it > works). Some context: > > I have set four fields to the preconditioner, which I want to regroup with > -pc_fieldsplit_0_fields value: 0,1 > -pc_fieldsplit_1_fields value: 2,3 > You should not have a colon, ":" Thanks, Matt > But apparently this doesn't get read for some reason. In fact, from > -ksp_view I still see all 4 fields, and it fails as one of the blocks > has a null diagonal (coming from a saddle point problem), so it gives > the following error. Interestingly, it shows that the groupings where > not considered: > > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry 0 > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! > Could be the program crashed before they were used or a spelling > mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_fieldsplit_0_fields value: 0,1 > [0]PETSC ERROR: Option left: name:-pc_fieldsplit_1_fields value: 2,3 > > Any clues to why this can happen? Best regards, > Nicol?s Barnafi > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbarnafi at cmm.uchile.cl Fri Feb 3 13:52:34 2023 From: nbarnafi at cmm.uchile.cl (Nicolas Barnafi) Date: Fri, 3 Feb 2023 16:52:34 -0300 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: Message-ID: Thanks Matt, Sorry, I copied the output from the error, but in the options file I do it as expected: -pc_fieldsplit_0_fields 0,1 -pc_fieldsplit_1_fields 2,3 Best On 03-02-23 16:50, Matthew Knepley wrote: > On Fri, Feb 3, 2023 at 2:46 PM Nicolas Barnafi via petsc-users > > wrote: > > Dear community, > > I am using a fieldsplit preconditioner, but for some reason I cannot > group fields as in other libraries (i.e. I do this in Firedrake and it > works). Some context: > > I have set four fields to the preconditioner, which I want to > regroup with > -pc_fieldsplit_0_fields value: 0,1 > -pc_fieldsplit_1_fields value: 2,3 > > > You should not have a colon, ":" > > ? Thanks, > > ? ? ?Matt > > But apparently this doesn't get read for some reason. In fact, from > -ksp_view I still see all 4 fields, and it fails as one of the blocks > has a null diagonal (coming from a saddle point problem), so it gives > the following error. Interestingly, it shows that the groupings where > not considered: > > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry 0 > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! > Could be the program crashed before they were used or a spelling > mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_fieldsplit_0_fields value: 0,1 > [0]PETSC ERROR: Option left: name:-pc_fieldsplit_1_fields value: 2,3 > > Any clues to why this can happen? Best regards, > Nicol?s Barnafi > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Fri Feb 3 13:55:02 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Feb 2023 14:55:02 -0500 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: Message-ID: On Fri, Feb 3, 2023 at 2:52 PM Nicolas Barnafi wrote: > Thanks Matt, > > Sorry, I copied the output from the error, but in the options file I do > it as expected: > > -pc_fieldsplit_0_fields 0,1 > -pc_fieldsplit_1_fields 2,3 > There are a number of common errors: 1) Your PC has a prefix 2) You have not called KSPSetFromOptions() here Can you send the -ksp_view output? Thanks, Matt > Best > > On 03-02-23 16:50, Matthew Knepley wrote: > > On Fri, Feb 3, 2023 at 2:46 PM Nicolas Barnafi via petsc-users > > > wrote: > > > > Dear community, > > > > I am using a fieldsplit preconditioner, but for some reason I cannot > > group fields as in other libraries (i.e. I do this in Firedrake and > it > > works). Some context: > > > > I have set four fields to the preconditioner, which I want to > > regroup with > > -pc_fieldsplit_0_fields value: 0,1 > > -pc_fieldsplit_1_fields value: 2,3 > > > > > > You should not have a colon, ":" > > > > Thanks, > > > > Matt > > > > But apparently this doesn't get read for some reason. In fact, from > > -ksp_view I still see all 4 fields, and it fails as one of the blocks > > has a null diagonal (coming from a saddle point problem), so it gives > > the following error. Interestingly, it shows that the groupings where > > not considered: > > > > [0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Matrix is missing diagonal entry 0 > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! > > Could be the program crashed before they were used or a spelling > > mistake, etc! > > [0]PETSC ERROR: Option left: name:-pc_fieldsplit_0_fields value: 0,1 > > [0]PETSC ERROR: Option left: name:-pc_fieldsplit_1_fields value: 2,3 > > > > Any clues to why this can happen? Best regards, > > Nicol?s Barnafi > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which > > their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbarnafi at cmm.uchile.cl Fri Feb 3 14:03:21 2023 From: nbarnafi at cmm.uchile.cl (Nicolas Barnafi) Date: Fri, 3 Feb 2023 17:03:21 -0300 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: Message-ID: > There are a number of common errors: > > ? 1) Your PC has a prefix > > ? 2) You have not called KSPSetFromOptions() here > > Can you send the -ksp_view output? The PC at least has no prefix. I had to set ksp_rtol to 1 to get through the solution process, you will find both the petsc_rc and the ksp_view at the bottom of this message. Options are indeed being set from the options file, but there must be something missing at a certain level. Thanks for looking into this. Best ---- petsc_rc file -ksp_monitor -ksp_type gmres -ksp_view -mat_type aij -ksp_norm_type unpreconditioned -ksp_atol 1e-14 -ksp_rtol 1 -pc_type fieldsplit -pc_fieldsplit_type multiplicative ---- ksp_view KSP Object: 1 MPI process type: gmres restart=500, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, nonzero initial guess tolerances: relative=1., absolute=1e-14, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI process type: fieldsplit FieldSplit with MULTIPLICATIVE composition: total splits = 4 Solver info for each split is in the following KSP objects: Split number 0 Defined by IS KSP Object: (fieldsplit_0_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI process type: ilu PC has not been set up so information may be incomplete out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural matrix solver type: petsc matrix not yet factored; no additional information available linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI process type: seqaij rows=615, cols=615 total: nonzeros=9213, allocated nonzeros=9213 total number of mallocs used during MatSetValues calls=0 not using I-node routines Split number 1 Defined by IS KSP Object: (fieldsplit_1_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: (fieldsplit_1_) 1 MPI process type: ilu PC has not been set up so information may be incomplete out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural matrix solver type: petsc matrix not yet factored; no additional information available linear system matrix = precond matrix: Mat Object: (fieldsplit_1_) 1 MPI process type: seqaij rows=64, cols=64 total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 13 nodes, limit used is 5 Split number 2 Defined by IS KSP Object: (fieldsplit_2_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: (fieldsplit_2_) 1 MPI process type: ilu PC has not been set up so information may be incomplete out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural matrix solver type: petsc matrix not yet factored; no additional information available linear system matrix = precond matrix: Mat Object: (fieldsplit_2_) 1 MPI process type: seqaij rows=240, cols=240 total: nonzeros=2140, allocated nonzeros=2140 total number of mallocs used during MatSetValues calls=0 not using I-node routines Split number 3 Defined by IS KSP Object: (fieldsplit_3_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: (fieldsplit_3_) 1 MPI process type: ilu PC has not been set up so information may be incomplete out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural matrix solver type: petsc matrix not yet factored; no additional information available linear system matrix = precond matrix: Mat Object: (fieldsplit_3_) 1 MPI process type: seqaij rows=300, cols=300 total: nonzeros=2292, allocated nonzeros=2292 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=1219, cols=1219 total: nonzeros=26443, allocated nonzeros=26443 total number of mallocs used during MatSetValues calls=0 not using I-node routines solving time: 0.00449609 iterations: 0 estimated error: 25.4142 From knepley at gmail.com Fri Feb 3 14:11:51 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Feb 2023 15:11:51 -0500 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: Message-ID: On Fri, Feb 3, 2023 at 3:03 PM Nicolas Barnafi wrote: > > There are a number of common errors: > > > > 1) Your PC has a prefix > > > > 2) You have not called KSPSetFromOptions() here > > > > Can you send the -ksp_view output? > > The PC at least has no prefix. I had to set ksp_rtol to 1 to get through > the solution process, you will find both the petsc_rc and the ksp_view > at the bottom of this message. > > Options are indeed being set from the options file, but there must be > something missing at a certain level. Thanks for looking into this. > Okay, the next step is to pass -info and send the output. This will tell us how the default splits were done. If that is not conclusive, we will have to use the debugger. Thanks, Matt > Best > > ---- petsc_rc file > > -ksp_monitor > -ksp_type gmres > -ksp_view > -mat_type aij > -ksp_norm_type unpreconditioned > -ksp_atol 1e-14 > -ksp_rtol 1 > -pc_type fieldsplit > -pc_fieldsplit_type multiplicative > > ---- ksp_view > > KSP Object: 1 MPI process > type: gmres > restart=500, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, nonzero initial guess > tolerances: relative=1., absolute=1e-14, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI process > type: fieldsplit > FieldSplit with MULTIPLICATIVE composition: total splits = 4 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_0_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI process > type: ilu > PC has not been set up so information may be incomplete > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > matrix solver type: petsc > matrix not yet factored; no additional information available > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI process > type: seqaij > rows=615, cols=615 > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > Split number 1 Defined by IS > KSP Object: (fieldsplit_1_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: (fieldsplit_1_) 1 MPI process > type: ilu > PC has not been set up so information may be incomplete > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > matrix solver type: petsc > matrix not yet factored; no additional information available > linear system matrix = precond matrix: > Mat Object: (fieldsplit_1_) 1 MPI process > type: seqaij > rows=64, cols=64 > total: nonzeros=0, allocated nonzeros=0 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 13 nodes, limit used is 5 > Split number 2 Defined by IS > KSP Object: (fieldsplit_2_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: (fieldsplit_2_) 1 MPI process > type: ilu > PC has not been set up so information may be incomplete > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > matrix solver type: petsc > matrix not yet factored; no additional information available > linear system matrix = precond matrix: > Mat Object: (fieldsplit_2_) 1 MPI process > type: seqaij > rows=240, cols=240 > total: nonzeros=2140, allocated nonzeros=2140 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > Split number 3 Defined by IS > KSP Object: (fieldsplit_3_) 1 MPI process > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: (fieldsplit_3_) 1 MPI process > type: ilu > PC has not been set up so information may be incomplete > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > matrix solver type: petsc > matrix not yet factored; no additional information available > linear system matrix = precond matrix: > Mat Object: (fieldsplit_3_) 1 MPI process > type: seqaij > rows=300, cols=300 > total: nonzeros=2292, allocated nonzeros=2292 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: seqaij > rows=1219, cols=1219 > total: nonzeros=26443, allocated nonzeros=26443 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > solving time: 0.00449609 > iterations: 0 > estimated error: 25.4142 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Fri Feb 3 17:21:44 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Fri, 3 Feb 2023 23:21:44 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> Message-ID: Thanks Matt for your response. I can confirm, I have the right size allocation for matrix. The linear system converges to the right solution. But I do know, for a fact many rows in my matrix diagonal entrees are zeros (because I intentionally don?t assign anything). Is that what is causing the problem? In other words do all diagonals have to be assigned a value (at least zero)? Back to my first thread, I looking to do something like the below (and I assumed that all entrees in a matrix has an initial value of zero by creation) For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 Best, Karthik. From: Matthew Knepley Date: Friday, 3 February 2023 at 18:35 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: Barry Smith , petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Fri, Feb 3, 2023 at 1:05 PM Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. The options "-options_left" does not take effect until PetscFinalize(), but your program crashed before that ran, so it comes up as not used in the error message, but then it runs and it is used. The main problem is that you have not preallocated the matrix to match your input. Thanks, Matt [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Fri Feb 3 17:42:28 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Fri, 3 Feb 2023 23:42:28 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> Message-ID: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Feb 3 18:21:39 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 3 Feb 2023 19:21:39 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> Message-ID: If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > I updated the main branch to the below commit but the same problem persists. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 > > > From: Barry Smith > > Date: Friday, 3 February 2023 at 18:51 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. > > Barry > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Sat Feb 4 02:24:36 2023 From: ksi2443 at gmail.com (=?UTF-8?B?6rmA7ISx7J21?=) Date: Sat, 4 Feb 2023 17:24:36 +0900 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: Following your comments, that works in multi processes. After extracting procedure from 'newmat' which is submatrix. PetscInt *indices; PetscMalloc1(1, &indices); Indices[0] = 0; ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_OWN_POINTER, &isrow); MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); (extract from newmat) I did 'ISDestroy(&isrow); PetscFree(indices);' However I got an error 'double free'. So I deleted PetscFree. Is this correct response of that error? If not, how should I deal with that error?? Thanks, Hyung Kim 2023? 2? 3? (?) ?? 11:33, Matthew Knepley ?? ??: > On Fri, Feb 3, 2023 at 9:04 AM ??? wrote: > >> Actually in the last mail, below scripts are running in all processes. >> >> IS isrow; >> PetscInt *indices; >> PetscMalloc1(1, &indices); >> Indices[0] = 0; >> ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_OWN_POINTER, &isrow); >> MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); >> (extract from newmat) >> >> However, you said it cannot get the values of first row of global matrix. >> Please let me know how can I fix this scripts for getting the 1st row of >> global matrix in all processes. >> > > Did you run and see what you get? If it is on all processes, it should > work. > > Thanks, > > Matt > > >> Hyung Kim >> >> >> >> >> >> >> >> 2023? 2? 3? (?) ?? 10:54, Matthew Knepley ?? ??: >> >>> On Fri, Feb 3, 2023 at 8:52 AM ??? wrote: >>> >>>> I want to extract same row values of global matrix in all processes. >>>> Then how can I do this?? >>>> >>> >>> Create the same IS on each process. >>> >>> THanks, >>> >>> Matt >>> >>> >>>> The case of same problem of vector, I just use vecscattertoall. >>>> However, I can't find same function for matrix. >>>> >>>> Hyung Kim >>>> >>>> 2023? 2? 3? (?) ?? 10:47, Matthew Knepley ?? ??: >>>> >>>>> On Fri, Feb 3, 2023 at 8:45 AM ??? wrote: >>>>> >>>>>> Following your comments, >>>>>> If I extract first row of below matrix. >>>>>> [image: image.png] >>>>>> IS isrow; >>>>>> PetscInt *indices; >>>>>> PetscMalloc1(1, *indices); >>>>>> >>>>> >>>>> That should be &indices. >>>>> >>>>> >>>>>> Indices[0] = 0; >>>>>> ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_COPY_VALUES, >>>>>> &isrow); >>>>>> >>>>> >>>>> You should use PETSC_OWN_POINTER. >>>>> >>>>> >>>>>> MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); >>>>>> >>>>>> Then can I get the array about first row of global matrix in all >>>>>> processes? >>>>>> >>>>> >>>>> No, just on the process which gives 0. If you do that on every >>>>> process, every rank with get row 0. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Hyung Kim >>>>>> >>>>>> 2023? 2? 3? (?) ?? 10:26, Matthew Knepley ?? ??: >>>>>> >>>>>>> On Fri, Feb 3, 2023 at 8:06 AM ??? wrote: >>>>>>> >>>>>>>> Following your comments, >>>>>>>> I want to check below things. >>>>>>>> For example, the global dense matrix are as below. >>>>>>>> [image: image.png] >>>>>>>> If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. >>>>>>>> Then I should put 'MatCreateSubMatrix >>>>>>>> ( >>>>>>>> mat, isrow, NULL, MAT_INITIAL_MATRIX, *&*newmat)' >>>>>>>> and isrow will be [0 1 2 3 4 5 6 7]. >>>>>>>> >>>>>>>> In this case, How can I make isrow? >>>>>>>> Actually I can't understand the procedure of handling isrow. >>>>>>>> >>>>>>> >>>>>>> You create an IS object of type ISGENERAL and give it the array of >>>>>>> global indices that you want to extract. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Hyung Kim >>>>>>>> >>>>>>>> 2023? 2? 3? (?) ?? 9:03, Mark Adams ?? ??: >>>>>>>> >>>>>>>>> https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ >>>>>>>>> >>>>>>>>> Note, PETSc lets you give NULL arguments if there is a reasonable >>>>>>>>> default. >>>>>>>>> In this case give NULL for the column IS and you will get the >>>>>>>>> whole columns. >>>>>>>>> >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> On Fri, Feb 3, 2023 at 4:05 AM ??? wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> By using MatGetRow, user can get vectors from local matrix (at >>>>>>>>>> each process). >>>>>>>>>> >>>>>>>>>> However, I need other process's row values. >>>>>>>>>> So I have 2 questions. >>>>>>>>>> >>>>>>>>>> 1. Is there any function for getting arrays from other process's?? >>>>>>>>>> >>>>>>>>>> 2. Or is there any function like matrix version of >>>>>>>>>> vecscattertoall?? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Hyung Kim >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 6931 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 7950 bytes Desc: not available URL: From karthikeyan.chockalingam at stfc.ac.uk Sat Feb 4 11:06:03 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Sat, 4 Feb 2023 17:06:03 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> Message-ID: Thank you very much for offering to debug. I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. I have attached three files: petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. problem.c ? A simple stand-alone petsc code, which produces almost the same error message. petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? Please use problem.c to debug the issue. Kind regards, Karthik. From: Barry Smith Date: Saturday, 4 February 2023 at 00:22 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI wrote: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith > Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: petsc_error_redistribute.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: petsc_amrex_error_redistribute.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: problem.c URL: From bsmith at petsc.dev Sat Feb 4 12:01:04 2023 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 4 Feb 2023 13:01:04 -0500 Subject: [petsc-users] Question about MatGetRow In-Reply-To: References: Message-ID: <5C37CF61-9F36-443C-B2B8-7486B1502466@petsc.dev> > PETSC_OWN_POINTER, indicates that the IS is to take ownership of the memory in indices, hence you are no longer reasonable for that memory and should not call PetscFree() on it. So the code is running correctly. > On Feb 4, 2023, at 3:24 AM, ??? wrote: > > Following your comments, that works in multi processes. > After extracting procedure from 'newmat' which is submatrix. > > PetscInt *indices; > PetscMalloc1(1, &indices); > Indices[0] = 0; > ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_OWN_POINTER, &isrow); > MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); > (extract from newmat) > > I did 'ISDestroy(&isrow); PetscFree(indices);' > However I got an error 'double free'. > So I deleted PetscFree. > Is this correct response of that error? > If not, how should I deal with that error?? > > Thanks, > Hyung Kim > > > > > > > > 2023? 2? 3? (?) ?? 11:33, Matthew Knepley >?? ??: >> On Fri, Feb 3, 2023 at 9:04 AM ??? > wrote: >>> Actually in the last mail, below scripts are running in all processes. >>> >>> IS isrow; >>> PetscInt *indices; >>> PetscMalloc1(1, &indices); >>> Indices[0] = 0; >>> ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_OWN_POINTER, &isrow); >>> MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); >>> (extract from newmat) >>> >>> However, you said it cannot get the values of first row of global matrix. >>> Please let me know how can I fix this scripts for getting the 1st row of global matrix in all processes. >> >> Did you run and see what you get? If it is on all processes, it should work. >> >> Thanks, >> >> Matt >> >>> Hyung Kim >>> >>> >>> >>> >>> >>> >>> >>> 2023? 2? 3? (?) ?? 10:54, Matthew Knepley >?? ??: >>>> On Fri, Feb 3, 2023 at 8:52 AM ??? > wrote: >>>>> I want to extract same row values of global matrix in all processes. >>>>> Then how can I do this?? >>>> >>>> Create the same IS on each process. >>>> >>>> THanks, >>>> >>>> Matt >>>> >>>>> The case of same problem of vector, I just use vecscattertoall. >>>>> However, I can't find same function for matrix. >>>>> >>>>> Hyung Kim >>>>> >>>>> 2023? 2? 3? (?) ?? 10:47, Matthew Knepley >?? ??: >>>>>> On Fri, Feb 3, 2023 at 8:45 AM ??? > wrote: >>>>>>> Following your comments, >>>>>>> If I extract first row of below matrix. >>>>>>> >>>>>>> IS isrow; >>>>>>> PetscInt *indices; >>>>>>> PetscMalloc1(1, *indices); >>>>>> >>>>>> That should be &indices. >>>>>> >>>>>>> Indices[0] = 0; >>>>>>> ISCreateGenreral(PETSC_COMM_WORLD, 1, indices, PETSC_COPY_VALUES, &isrow); >>>>>> >>>>>> You should use PETSC_OWN_POINTER. >>>>>> >>>>>>> MatCreateSubMatrix(mat,isrow,NULL,MAT_INITIAL_MATRIX,&newmat); >>>>>>> >>>>>>> Then can I get the array about first row of global matrix in all processes? >>>>>> >>>>>> No, just on the process which gives 0. If you do that on every process, every rank with get row 0. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>>> Hyung Kim >>>>>>> >>>>>>> 2023? 2? 3? (?) ?? 10:26, Matthew Knepley >?? ??: >>>>>>>> On Fri, Feb 3, 2023 at 8:06 AM ??? > wrote: >>>>>>>>> Following your comments, >>>>>>>>> I want to check below things. >>>>>>>>> For example, the global dense matrix are as below. >>>>>>>>> >>>>>>>>> If I want to get first row ('1 2 0 0 3 0 0 4') in Proc 1. >>>>>>>>> Then I should put 'MatCreateSubMatrix (mat, isrow, NULL, MAT_INITIAL_MATRIX, &newmat)' >>>>>>>>> and isrow will be [0 1 2 3 4 5 6 7]. >>>>>>>>> >>>>>>>>> In this case, How can I make isrow? >>>>>>>>> Actually I can't understand the procedure of handling isrow. >>>>>>>> >>>>>>>> You create an IS object of type ISGENERAL and give it the array of global indices that you want to extract. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>>> Hyung Kim >>>>>>>>> >>>>>>>>> 2023? 2? 3? (?) ?? 9:03, Mark Adams >?? ??: >>>>>>>>>> https://petsc.org/main/docs/manualpages/Mat/MatCreateSubMatrix/ >>>>>>>>>> >>>>>>>>>> Note, PETSc lets you give NULL arguments if there is a reasonable default. >>>>>>>>>> In this case give NULL for the column IS and you will get the whole columns. >>>>>>>>>> >>>>>>>>>> Mark >>>>>>>>>> >>>>>>>>>> On Fri, Feb 3, 2023 at 4:05 AM ??? > wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> By using MatGetRow, user can get vectors from local matrix (at each process). >>>>>>>>>>> >>>>>>>>>>> However, I need other process's row values. >>>>>>>>>>> So I have 2 questions. >>>>>>>>>>> >>>>>>>>>>> 1. Is there any function for getting arrays from other process's?? >>>>>>>>>>> >>>>>>>>>>> 2. Or is there any function like matrix version of vecscattertoall?? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Hyung Kim >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Sun Feb 5 12:26:17 2023 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Sun, 5 Feb 2023 19:26:17 +0100 Subject: [petsc-users] Help with fieldsplit performance Message-ID: Hello Petsc's crew, I would like to ask for some support in setting up the fieldsplit preconditioner in order to obtain better performance. I have already found some posts on the topic and keep experimenting, but I would like to hear your opinion as experts :) I have my fancy CFD pressure based coupled solver already validated on some basic problems, so I am confident the matrix is OK. However, I am struggling a bit in finding performance. In my experiments, I have found out that *schur *is the best in terms of overall iteration count, but it takes ages to converge! Using additive or multiplicative looks a better call, but in some cases I get a very high number of iterations to converge (500+). I attach here the logs (ksp_view and log_view) for an example case of the flow past a 90deg T-junction, 285k cells on 4 procs. GMRES + fieldsplit and schur take 90s to converge with 4 iters. Do you see anything strange in the way ksp is set up? Thank you for the support as always! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- KSP Object: (UPeqn_) 4 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, nonzero initial guess tolerances: relative=0., absolute=1e-06, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: (UPeqn_) 4 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, blocksize = 3, factorization FULL Preconditioner for the Schur complement formed from A11 Split info: Split number 0 Fields 0, 1 Split number 1 Fields 2 KSP solver for A00 block KSP Object: (UPeqn_fieldsplit_u_) 4 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (UPeqn_fieldsplit_u_) 4 MPI processes type: bjacobi number of blocks = 4 Local solver information for first block is in the following KSP and PC objects on rank 0: Use -UPeqn_fieldsplit_u_ksp_view ::ascii_info_detail to display information for all blocks KSP Object: (UPeqn_fieldsplit_u_sub_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (UPeqn_fieldsplit_u_sub_) 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: (UPeqn_fieldsplit_u_sub_) 1 MPI process type: seqaij rows=48586, cols=48586, bs=2 package used to perform factorization: petsc total: nonzeros=483068, allocated nonzeros=483068 using I-node routines: found 24293 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (UPeqn_fieldsplit_u_sub_) 1 MPI process type: seqaij rows=48586, cols=48586, bs=2 total: nonzeros=483068, allocated nonzeros=483068 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 24293 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (UPeqn_fieldsplit_u_) 4 MPI processes type: mpiaij rows=190000, cols=190000, bs=2 total: nonzeros=1891600, allocated nonzeros=1891600 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 24293 nodes, limit used is 5 KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (UPeqn_fieldsplit_p_) 4 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (UPeqn_fieldsplit_p_) 4 MPI processes type: hypre HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 25 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.25 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse 1 Relax down symmetric-SOR/Jacobi Relax up symmetric-SOR/Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 1. Outer relax weight (all) 1. Using CF-relaxation Not using more complex smoothers. Measure type local Coarsen type Falgout Interpolation type classical SpGEMM type cusparse linear system matrix followed by preconditioner matrix: Mat Object: (UPeqn_fieldsplit_p_) 4 MPI processes type: schurcomplement rows=95000, cols=95000 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (UPeqn_fieldsplit_p_) 4 MPI processes type: mpiaij rows=95000, cols=95000 total: nonzeros=472900, allocated nonzeros=472900 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines A10 Mat Object: 4 MPI processes type: mpiaij rows=95000, cols=190000 total: nonzeros=945800, allocated nonzeros=945800 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines KSP of A00 KSP Object: (UPeqn_fieldsplit_u_) 4 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (UPeqn_fieldsplit_u_) 4 MPI processes type: bjacobi number of blocks = 4 Local solver information for first block is in the following KSP and PC objects on rank 0: Use -UPeqn_fieldsplit_u_ksp_view ::ascii_info_detail to display information for all blocks KSP Object: (UPeqn_fieldsplit_u_sub_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (UPeqn_fieldsplit_u_sub_) 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: (UPeqn_fieldsplit_u_sub_) 1 MPI process type: seqaij rows=48586, cols=48586, bs=2 package used to perform factorization: petsc total: nonzeros=483068, allocated nonzeros=483068 using I-node routines: found 24293 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (UPeqn_fieldsplit_u_sub_) 1 MPI process type: seqaij rows=48586, cols=48586, bs=2 total: nonzeros=483068, allocated nonzeros=483068 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 24293 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (UPeqn_fieldsplit_u_) 4 MPI processes type: mpiaij rows=190000, cols=190000, bs=2 total: nonzeros=1891600, allocated nonzeros=1891600 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 24293 nodes, limit used is 5 A01 Mat Object: 4 MPI processes type: mpiaij rows=190000, cols=95000, rbs=2, cbs=1 total: nonzeros=945800, allocated nonzeros=945800 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 24293 nodes, limit used is 5 Mat Object: (UPeqn_fieldsplit_p_) 4 MPI processes type: mpiaij rows=95000, cols=95000 total: nonzeros=472900, allocated nonzeros=472900 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=285000, cols=285000, bs=3 total: nonzeros=4256100, allocated nonzeros=4256100 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 24293 nodes, limit used is 5 -------------- next part -------------- flubio_coupled on a linux_x86_64 named localhost.localdomain with 4 processors, by edo Sun Feb 5 19:21:59 2023 Using Petsc Release Version 3.18.3, unknown Max Max/Min Avg Total Time (sec): 9.234e+01 1.000 9.234e+01 Objects: 2.190e+02 1.000 2.190e+02 Flops: 8.885e+10 1.083 8.686e+10 3.475e+11 Flops/sec: 9.623e+08 1.083 9.407e+08 3.763e+09 MPI Msg Count: 5.765e+04 2.999 3.844e+04 1.537e+05 MPI Msg Len (bytes): 9.189e+07 2.972 1.548e+03 2.380e+08 MPI Reductions: 3.723e+04 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 9.2336e+01 100.0% 3.4746e+11 100.0% 1.537e+05 100.0% 1.548e+03 100.0% 3.721e+04 99.9% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 13 1.0 1.6536e-01 2.7 0.00e+00 0.0 4.0e+01 4.0e+00 1.3e+01 0 0 0 0 0 0 0 0 0 0 0 BuildTwoSidedF 6 1.0 1.4933e-01 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 906 1.0 8.5608e+01 1.0 8.60e+10 1.1 1.5e+05 1.5e+03 3.5e+04 93 97 100 99 95 93 97 100 99 95 3928 MatMultAdd 254 1.0 8.2637e-02 1.1 6.15e+07 1.1 2.0e+03 7.7e+02 1.0e+00 0 0 1 1 0 0 0 1 1 0 2907 MatSolve 18680 1.0 2.7633e+01 1.1 1.71e+10 1.1 0.0e+00 0.0e+00 0.0e+00 29 19 0 0 0 29 19 0 0 0 2424 MatLUFactorNum 1 1.0 7.5099e-03 1.6 1.97e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1026 MatILUFactorSym 1 1.0 6.3318e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatConvert 1 1.0 2.8405e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 19 1.0 9.5880e-02 1154.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 19 1.0 9.6739e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 3 1.0 1.4920e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMat 4 1.0 2.0991e-01 1.1 0.00e+00 0.0 6.4e+01 2.3e+03 4.4e+01 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.2688e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 4.9500e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 4 1.0 2.9473e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 9.0082e+01 1.0 8.88e+10 1.1 1.5e+05 1.5e+03 3.7e+04 98 100 100 99 100 98 100 100 99 100 3857 KSPGMRESOrthog 18157 1.0 3.4719e+01 1.1 4.99e+10 1.1 0.0e+00 0.0e+00 1.8e+04 37 56 0 0 49 37 56 0 0 49 5616 PCSetUp 4 1.0 3.7452e-01 1.0 1.97e+06 1.1 8.8e+01 2.8e+04 7.6e+01 0 0 0 1 0 0 0 0 1 0 21 PCSetUpOnBlocks 264 1.0 1.5411e-02 1.3 1.97e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 500 PCApply 5 1.0 9.0060e+01 1.0 8.88e+10 1.1 1.5e+05 1.5e+03 3.7e+04 98 100 100 99 100 98 100 100 99 100 3857 KSPSolve_FS_0 5 1.0 1.6113e+00 1.0 1.68e+09 1.1 2.8e+03 1.6e+03 7.0e+02 2 2 2 2 2 2 2 2 2 2 4077 KSPSolve_FS_Schu 5 1.0 8.7013e+01 1.0 8.58e+10 1.1 1.5e+05 1.5e+03 3.6e+04 94 97 97 96 96 94 97 97 96 96 3854 KSPSolve_FS_Low 5 1.0 1.4239e+00 1.0 1.38e+09 1.1 2.3e+03 1.5e+03 5.7e+02 2 2 1 1 2 2 2 1 1 2 3793 VecMDot 18157 1.0 2.0707e+01 1.1 2.49e+10 1.1 0.0e+00 0.0e+00 1.8e+04 21 28 0 0 49 21 28 0 0 49 4708 VecNorm 18949 1.0 3.8294e+00 1.2 1.83e+09 1.1 0.0e+00 0.0e+00 1.9e+04 4 2 0 0 51 4 2 0 0 51 1868 VecScale 19203 1.0 4.4065e-01 1.1 9.21e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 8169 VecCopy 789 1.0 5.7539e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 20007 1.0 4.2773e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 1314 1.0 8.6258e-02 1.0 1.27e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5764 VecMAXPY 18944 1.0 1.5599e+01 1.0 2.67e+10 1.1 0.0e+00 0.0e+00 0.0e+00 17 30 0 0 0 17 30 0 0 0 6689 VecAssemblyBegin 5 1.0 5.3558e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 5 1.0 3.8580e-05 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 19225 1.0 2.2305e-01 1.3 0.00e+00 0.0 1.5e+05 1.5e+03 7.0e+00 0 0 100 99 0 0 0 100 99 0 0 VecScatterEnd 19225 1.0 2.7400e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecNormalize 18944 1.0 4.2351e+00 1.1 2.74e+09 1.1 0.0e+00 0.0e+00 1.9e+04 4 3 0 0 51 4 3 0 0 51 2533 SFSetGraph 7 1.0 2.2973e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 7 1.0 2.1191e-02 1.3 0.00e+00 0.0 8.0e+01 3.5e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFPack 19225 1.0 4.0084e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 19225 1.0 5.2803e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 --- Event Stage 1: Unknown ------------------------------------------------------------------------------------------------------------------------ Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 23 5 Krylov Solver 6 1 Preconditioner 6 1 Vector 126 28 Index Set 35 16 Star Forest Graph 13 0 Distributed Mesh 3 0 Discrete System 3 0 Weak Form 3 0 Viewer 1 0 --- Event Stage 1: Unknown ======================================================================================================================== Average time to get PetscTime(): 2.77e-08 Average time for MPI_Barrier(): 0.000138606 Average time for zero size MPI_Send(): 5.6264e-05 #PETSc Option Table entries: -log_view -UPeqn_fieldsplit_p_pc_type hypre -UPeqn_fieldsplit_u_pc_type bjacobi -UPeqn_pc_fieldsplit_type schur #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: PETSC_ARCH=linux_x86_64 FOPTFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 -with-debugging=no -download-fblaslapack=1 -download-superlu_dist -download-mumps -download-hypre -download-metis -download-parmetis -download-scalapack -download-ml -download-slepc -download-hpddm -download-cmake -with-mpi-dir=/home/edo/software_repo/openmpi-4.1.4/build/ ----------------------------------------- Libraries compiled on 2023-01-08 17:23:02 on localhost.localdomain Machine characteristics: Linux-5.14.0-162.6.1.el9_1.x86_64-x86_64-with-glibc2.34 Using PETSc directory: /home/edo/software_repo/petsc Using PETSc arch: linux_x86_64 ----------------------------------------- Using C compiler: /home/edo/software_repo/openmpi-4.1.4/build/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3 Using Fortran compiler: /home/edo/software_repo/openmpi-4.1.4/build/bin/mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -O3 ----------------------------------------- Using include paths: -I/home/edo/software_repo/petsc/include -I/home/edo/software_repo/petsc/linux_x86_64/include -I/home/edo/software_repo/openmpi-4.1.4/build/include ----------------------------------------- Using C linker: /home/edo/software_repo/openmpi-4.1.4/build/bin/mpicc Using Fortran linker: /home/edo/software_repo/openmpi-4.1.4/build/bin/mpif90 Using libraries: -Wl,-rpath,/home/edo/software_repo/petsc/linux_x86_64/lib -L/home/edo/software_repo/petsc/linux_x86_64/lib -lpetsc -Wl,-rpath,/home/edo/software_repo/petsc/linux_x86_64/lib -L/home/edo/software_repo/petsc/linux_x86_64/lib -Wl,-rpath,/home/edo/software_repo/openmpi-4.1.4/build/lib -L/home/edo/software_repo/openmpi-4.1.4/build/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/11 -L/usr/lib/gcc/x86_64-redhat-linux/11 -lHYPRE -ldmumps -lmumps_common -lpord -lpthread -lscalapack -lsuperlu_dist -lml -lflapack -lfblas -lparmetis -lmetis -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl ----------------------------------------- From knepley at gmail.com Sun Feb 5 13:13:55 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 5 Feb 2023 14:13:55 -0500 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: On Sun, Feb 5, 2023 at 1:26 PM Edoardo alinovi wrote: > Hello Petsc's crew, > > I would like to ask for some support in setting up the fieldsplit > preconditioner in order to obtain better performance. I have already found > some posts on the topic and keep experimenting, but I would like to hear > your opinion as experts :) > > I have my fancy CFD pressure based coupled solver already validated on > some basic problems, so I am confident the matrix is OK. However, I am > struggling a bit in finding performance. In my experiments, I have found > out that *schur *is the best in terms of overall iteration count, but it > takes ages to converge! Using additive or multiplicative looks a better > call, but in some cases I get a very high number of iterations to converge > (500+). > > I attach here the logs (ksp_view and log_view) for an example case of the > flow past a 90deg T-junction, 285k cells on 4 procs. > > GMRES + fieldsplit and schur take 90s to converge with 4 iters. Do you see > anything strange in the way ksp is set up? > 1. You are using A11 as the preconditioning matrix for the Schur complement. Do you expect this to be a good idea? 2. You are using BJacobi/ILU(0) for A00. Is this a good idea? 3. Do you know how many iterates you are using for the A00 solve and for the Schur complement? Note that A00 gets solved for each iteration of the Schur complement, so you would multiply those two together to get an idea of the time. Once we see the number of subiterates, I think we can say something more. Thanks, Matt > Thank you for the support as always! > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Sun Feb 5 13:17:17 2023 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Sun, 5 Feb 2023 20:17:17 +0100 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: Hi Matt, Thanks for your feedback! Would you suggest me the command line to get the number of sub iterations? Il Dom 5 Feb 2023, 20:14 Matthew Knepley ha scritto: > On Sun, Feb 5, 2023 at 1:26 PM Edoardo alinovi > wrote: > >> Hello Petsc's crew, >> >> I would like to ask for some support in setting up the fieldsplit >> preconditioner in order to obtain better performance. I have already found >> some posts on the topic and keep experimenting, but I would like to hear >> your opinion as experts :) >> >> I have my fancy CFD pressure based coupled solver already validated on >> some basic problems, so I am confident the matrix is OK. However, I am >> struggling a bit in finding performance. In my experiments, I have found >> out that *schur *is the best in terms of overall iteration count, but it >> takes ages to converge! Using additive or multiplicative looks a better >> call, but in some cases I get a very high number of iterations to converge >> (500+). >> >> I attach here the logs (ksp_view and log_view) for an example case of >> the flow past a 90deg T-junction, 285k cells on 4 procs. >> >> GMRES + fieldsplit and schur take 90s to converge with 4 iters. Do you >> see anything strange in the way ksp is set up? >> > > 1. You are using A11 as the preconditioning matrix for the Schur > complement. Do you expect this to be a good idea? > > 2. You are using BJacobi/ILU(0) for A00. Is this a good idea? > > 3. Do you know how many iterates you are using for the A00 solve and for > the Schur complement? Note that A00 gets solved for each iteration of the > Schur complement, so you would multiply those two together to get an idea > of the time. > > Once we see the number of subiterates, I think we can say something more. > > Thanks, > > Matt > > >> Thank you for the support as always! >> > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Sun Feb 5 14:56:28 2023 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Sun, 5 Feb 2023 21:56:28 +0100 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: Maybe I managed to find out the number of iters of the field split by adding: fieldsplit_u_ksp_converged_reason. This gets printed 268 times: Linear UPeqn_fieldsplit_u_ solve converged due to CONVERGED_ITS iterations 1 1. You are using A11 as the preconditioning matrix for the Schur complement. Do you expect this to be a good idea? Good question, I have also tried self and selfp but not much difference! Any suggestions? 2. You are using BJacobi/ILU(0) for A00. Is this a good idea? I have tried other options like hypre and ML but this one looks the best for now, do you have other suggestions? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 5 15:13:46 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 5 Feb 2023 16:13:46 -0500 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: On Sun, Feb 5, 2023 at 3:56 PM Edoardo alinovi wrote: > Maybe I managed to find out the number of iters of the field split by > adding: fieldsplit_u_ksp_converged_reason. > You also want -fieldsplit_p_ksp_converged_reason so we can see the iterates for the Schur complement. > This gets printed 268 times: > Linear UPeqn_fieldsplit_u_ solve converged due to CONVERGED_ITS > iterations 1 > This is a little strange for BJacobi/ILU(0). Wait, you are using preonly, so it can only take 1 iterate no matter how bad the solve is. Is that what you want? Usually for good convergence, you need accurate solves of the A00 system. 1. You are using A11 as the preconditioning matrix for the Schur > complement. Do you expect this to be a good idea? > Good question, I have also tried self and selfp but not much difference! > Any suggestions? > It depends on what your Schur complement operator looks like. Usually you are much better off assembling an approximation to it. For example, the Schur complement for the Stokes equation is spectrally equivalent to the mass matrix, and this makes a good preconditioning matrix. > 2. You are using BJacobi/ILU(0) for A00. Is this a good idea? > I have tried other options like hypre and ML but this one looks the best > for now, do you have other suggestions? > It could be, but using preonly seems dangerous. I would first use GMRES and see how many iterates are used. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Sun Feb 5 15:35:15 2023 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Sun, 5 Feb 2023 22:35:15 +0100 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: Matt I messed up a bit the things as I sent you the number of sub iters done with preonly. It looks to be faster but outer iters goes up from 4 (with gmres as sub ksp) to 399, while run time goes from 37s to 10s. I manage to drop a bit the run time by dropping the relative tolerance Looking at the iters done by sub gmres, they are around 20ish for U block, and around 5 for P block, so nothing too crazy, what do you think? -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Sun Feb 5 15:47:13 2023 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Sun, 5 Feb 2023 22:47:13 +0100 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: PS: can you suggest any example to assemble an approximation for Schur? I am doing incompressible NS here, not really "sure" on how to deal with it and something to start would be useful! -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Mon Feb 6 09:57:51 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Mon, 06 Feb 2023 16:57:51 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= Message-ID: <122ec8-63e12380-d7-49c1ef80@229948806> Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 6 10:04:40 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 6 Feb 2023 11:04:40 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <122ec8-63e12380-d7-49c1ef80@229948806> References: <122ec8-63e12380-d7-49c1ef80@229948806> Message-ID: Paul, I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking. You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU. Barry > On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote: > > Hi, > > I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). > Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. > For now I am using -log_view with added stages: > > PetscLogStageRegister("Solve Bench", &solve_bench_stage); > for (int i = 0; i < BENCH_SOLVES; i++) { > PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x > PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve > PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done > > PetscLogStagePush(solve_bench_stage); > PetscCall(KSPSolve(ksp, NULL, NULL)); > PetscLogStagePop(); > } > > This snippet is preceded by a similar loop for warmup. > > When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs , but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. > > Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? > > Thanks, > Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbarnafi at cmm.uchile.cl Mon Feb 6 10:45:08 2023 From: nbarnafi at cmm.uchile.cl (Nicolas Barnafi) Date: Mon, 6 Feb 2023 17:45:08 +0100 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: Message-ID: Thank you Matt, Again, at the bottom of this message you will find the -info output. I don't see any output related to the fields, Best ------ -info [0] PetscDetermineInitialFPTrap(): Floating point trapping is on by default 13 [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType host available, initializing [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host initialized, default device id 0, view FALSE, init type lazy [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType cuda not available [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType hip not available [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType sycl not available [0] PetscInitialize_Common(): PETSc successfully started: number of processors = 1 [0] PetscGetHostName(): Rejecting domainname, likely is NIS nico-santech.(none) [0] PetscInitialize_Common(): Running on machine: nico-santech [0] SlepcInitialize(): SLEPc successfully started [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 94770087780768 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770087780768 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 is being unlinked from inner PETSc comm 94770087780768 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770087780768 [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 94770087780768 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770087780768 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 is being unlinked from inner PETSc comm 94770087780768 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770087780768 [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 94770087780768 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770087780768 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 is being unlinked from inner PETSc comm 94770087780768 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770087780768 [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 94770087780768 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770087780768 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 is being unlinked from inner PETSc comm 94770087780768 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770087780768 [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 94770087780768 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770087780768 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 is being unlinked from inner PETSc comm 94770087780768 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770087780768 [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 94770087780768 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066936960 94770087780768 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1219 X 1219; storage space: 0 unneeded,26443 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 150 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1219) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 1160 nodes out of 1219 rows. Not using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066936960 94770087780768 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066936960 94770087780768 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066936960 94770087780768 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066936960 94770087780768 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066936960 94770087780768 [0] PetscGetHostName(): Rejecting domainname, likely is NIS nico-santech.(none) [0] PCSetUp(): Setting up PC for first time [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 615 X 615; storage space: 0 unneeded,9213 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 117 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 615) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 561 nodes out of 615 rows. Not using Inode routines [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 94770110251424 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770110251424 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 is being unlinked from inner PETSc comm 94770110251424 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770110251424 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770110251424 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 64 X 64; storage space: 0 unneeded,0 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 64)/(num_localrows 64) > 0.6. Use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 13 nodes of 64. Limit used: 5. Using Inode routines [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 94770100861088 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770100861088 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 is being unlinked from inner PETSc comm 94770100861088 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770100861088 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 240 X 240; storage space: 0 unneeded,2140 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 11 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 240) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 235 nodes out of 240 rows. Not using Inode routines [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 94770100861088 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770100861088 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 is being unlinked from inner PETSc comm 94770100861088 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770100861088 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 300 X 300; storage space: 0 unneeded,2292 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 33 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 300) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 300 nodes out of 300 rows. Not using Inode routines [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 94770100861088 max tags = 2147483647 [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc communicator embedded in a user MPI_Comm 94770100861088 [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 is being unlinked from inner PETSc comm 94770100861088 [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an MPI_Comm 94770100861088 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 615 X 1219; storage space: 0 unneeded,11202 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 150 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 615) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 561 nodes out of 615 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 64 X 1219; storage space: 0 unneeded,288 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 64) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 64 nodes out of 64 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 240 X 1219; storage space: 0 unneeded,8800 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 78 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 240) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 235 nodes out of 240 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 300 X 1219; storage space: 0 unneeded,6153 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 89 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 300) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 300 nodes out of 300 rows. Not using Inode routines [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 94770100861088 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066934048 94770100861088 ? 0 KSP Residual norm 2.541418258630e+01 [0] KSPConvergedDefault(): user has provided nonzero initial guess, computing 2-norm of RHS [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Setting up PC for first time [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066934048 94770100861088 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066934048 94770100861088 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066934048 94770100861088 [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Setting up PC for first time [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066934048 94770100861088 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066934048 94770100861088 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066934048 94770100861088 [0] PetscCommDuplicate(): Using internal PETSc communicator 94770066934048 947701008610882 On 03/02/23 21:11, Matthew Knepley wrote: > On Fri, Feb 3, 2023 at 3:03 PM Nicolas Barnafi > wrote: > > > There are a number of common errors: > > > >? ? 1) Your PC has a prefix > > > >? ? 2) You have not called KSPSetFromOptions() here > > > > Can you send the -ksp_view output? > > The PC at least has no prefix. I had to set ksp_rtol to 1 to get > through > the solution process, you will find both the petsc_rc and the > ksp_view > at the bottom of this message. > > Options are indeed being set from the options file, but there must be > something missing at a certain level. Thanks for looking into this. > > > Okay, the next step is to pass > > ? -info > > and send the output. This will tell us how the default splits were > done. If that > is not conclusive, we will have?to use the debugger. > > ? Thanks, > > ? ? ?Matt > > Best > > ---- petsc_rc file > > -ksp_monitor > -ksp_type gmres > -ksp_view > -mat_type aij > -ksp_norm_type unpreconditioned > -ksp_atol 1e-14 > -ksp_rtol 1 > -pc_type fieldsplit > -pc_fieldsplit_type multiplicative > > ---- ksp_view > > KSP Object: 1 MPI process > ? ?type: gmres > ? ? ?restart=500, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > ? ? ?happy breakdown tolerance 1e-30 > ? ?maximum iterations=10000, nonzero initial guess > ? ?tolerances:? relative=1., absolute=1e-14, divergence=10000. > ? ?right preconditioning > ? ?using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI process > ? ?type: fieldsplit > ? ? ?FieldSplit with MULTIPLICATIVE composition: total splits = 4 > ? ? ?Solver info for each split is in the following KSP objects: > ? ?Split number 0 Defined by IS > ? ?KSP Object: (fieldsplit_0_) 1 MPI process > ? ? ?type: preonly > ? ? ?maximum iterations=10000, initial guess is zero > ? ? ?tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > ? ? ?left preconditioning > ? ? ?using DEFAULT norm type for convergence test > ? ?PC Object: (fieldsplit_0_) 1 MPI process > ? ? ?type: ilu > ? ? ?PC has not been set up so information may be incomplete > ? ? ? ?out-of-place factorization > ? ? ? ?0 levels of fill > ? ? ? ?tolerance for zero pivot 2.22045e-14 > ? ? ? ?matrix ordering: natural > ? ? ? ?matrix solver type: petsc > ? ? ? ?matrix not yet factored; no additional information available > ? ? ?linear system matrix = precond matrix: > ? ? ?Mat Object: (fieldsplit_0_) 1 MPI process > ? ? ? ?type: seqaij > ? ? ? ?rows=615, cols=615 > ? ? ? ?total: nonzeros=9213, allocated nonzeros=9213 > ? ? ? ?total number of mallocs used during MatSetValues calls=0 > ? ? ? ? ?not using I-node routines > ? ?Split number 1 Defined by IS > ? ?KSP Object: (fieldsplit_1_) 1 MPI process > ? ? ?type: preonly > ? ? ?maximum iterations=10000, initial guess is zero > ? ? ?tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > ? ? ?left preconditioning > ? ? ?using DEFAULT norm type for convergence test > ? ?PC Object: (fieldsplit_1_) 1 MPI process > ? ? ?type: ilu > ? ? ?PC has not been set up so information may be incomplete > ? ? ? ?out-of-place factorization > ? ? ? ?0 levels of fill > ? ? ? ?tolerance for zero pivot 2.22045e-14 > ? ? ? ?matrix ordering: natural > ? ? ? ?matrix solver type: petsc > ? ? ? ?matrix not yet factored; no additional information available > ? ? ?linear system matrix = precond matrix: > ? ? ?Mat Object: (fieldsplit_1_) 1 MPI process > ? ? ? ?type: seqaij > ? ? ? ?rows=64, cols=64 > ? ? ? ?total: nonzeros=0, allocated nonzeros=0 > ? ? ? ?total number of mallocs used during MatSetValues calls=0 > ? ? ? ? ?using I-node routines: found 13 nodes, limit used is 5 > ? ?Split number 2 Defined by IS > ? ?KSP Object: (fieldsplit_2_) 1 MPI process > ? ? ?type: preonly > ? ? ?maximum iterations=10000, initial guess is zero > ? ? ?tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > ? ? ?left preconditioning > ? ? ?using DEFAULT norm type for convergence test > ? ?PC Object: (fieldsplit_2_) 1 MPI process > ? ? ?type: ilu > ? ? ?PC has not been set up so information may be incomplete > ? ? ? ?out-of-place factorization > ? ? ? ?0 levels of fill > ? ? ? ?tolerance for zero pivot 2.22045e-14 > ? ? ? ?matrix ordering: natural > ? ? ? ?matrix solver type: petsc > ? ? ? ?matrix not yet factored; no additional information available > ? ? ?linear system matrix = precond matrix: > ? ? ?Mat Object: (fieldsplit_2_) 1 MPI process > ? ? ? ?type: seqaij > ? ? ? ?rows=240, cols=240 > ? ? ? ?total: nonzeros=2140, allocated nonzeros=2140 > ? ? ? ?total number of mallocs used during MatSetValues calls=0 > ? ? ? ? ?not using I-node routines > ? ?Split number 3 Defined by IS > ? ?KSP Object: (fieldsplit_3_) 1 MPI process > ? ? ?type: preonly > ? ? ?maximum iterations=10000, initial guess is zero > ? ? ?tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > ? ? ?left preconditioning > ? ? ?using DEFAULT norm type for convergence test > ? ?PC Object: (fieldsplit_3_) 1 MPI process > ? ? ?type: ilu > ? ? ?PC has not been set up so information may be incomplete > ? ? ? ?out-of-place factorization > ? ? ? ?0 levels of fill > ? ? ? ?tolerance for zero pivot 2.22045e-14 > ? ? ? ?matrix ordering: natural > ? ? ? ?matrix solver type: petsc > ? ? ? ?matrix not yet factored; no additional information available > ? ? ?linear system matrix = precond matrix: > ? ? ?Mat Object: (fieldsplit_3_) 1 MPI process > ? ? ? ?type: seqaij > ? ? ? ?rows=300, cols=300 > ? ? ? ?total: nonzeros=2292, allocated nonzeros=2292 > ? ? ? ?total number of mallocs used during MatSetValues calls=0 > ? ? ? ? ?not using I-node routines > ? ?linear system matrix = precond matrix: > ? ?Mat Object: 1 MPI process > ? ? ?type: seqaij > ? ? ?rows=1219, cols=1219 > ? ? ?total: nonzeros=26443, allocated nonzeros=26443 > ? ? ?total number of mallocs used during MatSetValues calls=0 > ? ? ? ?not using I-node routines > ? ? ? ? ? ? ? solving time: 0.00449609 > ? ? ? ? ? ? ? ? iterations: 0 > ? ? ? ? ? ?estimated error: 25.4142 > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 6 10:57:13 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 6 Feb 2023 11:57:13 -0500 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: Message-ID: On Mon, Feb 6, 2023 at 11:45 AM Nicolas Barnafi wrote: > Thank you Matt, > > Again, at the bottom of this message you will find the -info output. I > don't see any output related to the fields, > If the splits were done automatically, you would see an info message from here: https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/pc/impls/fieldsplit/fieldsplit.c#L1595 Thus it must be setup here https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/pc/impls/fieldsplit/fieldsplit.c#L380 There are info statements, but you do not see them, I do not see a way around using a small example to understand how you are setting up the system, since it is working as expected in the PETSc examples. Thanks, Matt > Best > > > ------ -info > > [0] PetscDetermineInitialFPTrap(): Floating point trapping is on by > default 13 > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType > host available, initializing > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host > initialized, default device id 0, view FALSE, init type lazy > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType > cuda not available > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType > hip not available > [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType > sycl not available > [0] PetscInitialize_Common(): PETSc successfully started: number of > processors = 1 > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > nico-santech.(none) > [0] PetscInitialize_Common(): Running on machine: nico-santech > [0] SlepcInitialize(): SLEPc successfully started > [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 > 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 > is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 > 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 > is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 > 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 > is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 > 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 > is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 > 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 > is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 > 94770087780768 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1219 X 1219; storage > space: 0 unneeded,26443 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 150 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 1219) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 1160 nodes out of 1219 rows. Not > using Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > nico-santech.(none) > [0] PCSetUp(): Setting up PC for first time > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 615 X 615; storage space: > 0 unneeded,9213 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 117 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 615) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 561 nodes out of 615 rows. Not > using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 > 94770110251424 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770110251424 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 > is being unlinked from inner PETSc comm 94770110251424 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770110251424 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770110251424 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 64 X 64; storage space: 0 > unneeded,0 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 64)/(num_localrows 64) > 0.6. Use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 13 nodes of 64. Limit used: 5. > Using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 > 94770100861088 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770100861088 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 > is being unlinked from inner PETSc comm 94770100861088 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770100861088 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 240 X 240; storage space: > 0 unneeded,2140 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 11 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 240) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 235 nodes out of 240 rows. Not > using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 > 94770100861088 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770100861088 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 > is being unlinked from inner PETSc comm 94770100861088 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770100861088 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 300 X 300; storage space: > 0 unneeded,2292 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 33 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 300) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 300 nodes out of 300 rows. Not > using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 > 94770100861088 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc > communicator embedded in a user MPI_Comm 94770100861088 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 > is being unlinked from inner PETSc comm 94770100861088 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an > MPI_Comm 94770100861088 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 615 X 1219; storage space: > 0 unneeded,11202 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 150 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 615) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 561 nodes out of 615 rows. Not > using Inode routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 64 X 1219; storage space: > 0 unneeded,288 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 64) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 64 nodes out of 64 rows. Not using > Inode routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 240 X 1219; storage space: > 0 unneeded,8800 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 78 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 240) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 235 nodes out of 240 rows. Not > using Inode routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 300 X 1219; storage space: > 0 unneeded,6153 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 89 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 300) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 300 nodes out of 300 rows. Not > using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 > 94770100861088 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > 0 KSP Residual norm 2.541418258630e+01 > [0] KSPConvergedDefault(): user has provided nonzero initial guess, > computing 2-norm of RHS > [0] PCSetUp(): Leaving PC with identical preconditioner since > operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since > operator is unchanged > [0] PCSetUp(): Setting up PC for first time > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PCSetUp(): Leaving PC with identical preconditioner since > operator is unchanged > [0] PCSetUp(): Setting up PC for first time > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 947701008610882 > > > On 03/02/23 21:11, Matthew Knepley wrote: > > On Fri, Feb 3, 2023 at 3:03 PM Nicolas Barnafi > wrote: > >> > There are a number of common errors: >> > >> > 1) Your PC has a prefix >> > >> > 2) You have not called KSPSetFromOptions() here >> > >> > Can you send the -ksp_view output? >> >> The PC at least has no prefix. I had to set ksp_rtol to 1 to get through >> the solution process, you will find both the petsc_rc and the ksp_view >> at the bottom of this message. >> >> Options are indeed being set from the options file, but there must be >> something missing at a certain level. Thanks for looking into this. >> > > Okay, the next step is to pass > > -info > > and send the output. This will tell us how the default splits were done. > If that > is not conclusive, we will have to use the debugger. > > Thanks, > > Matt > > >> Best >> >> ---- petsc_rc file >> >> -ksp_monitor >> -ksp_type gmres >> -ksp_view >> -mat_type aij >> -ksp_norm_type unpreconditioned >> -ksp_atol 1e-14 >> -ksp_rtol 1 >> -pc_type fieldsplit >> -pc_fieldsplit_type multiplicative >> >> ---- ksp_view >> >> KSP Object: 1 MPI process >> type: gmres >> restart=500, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> happy breakdown tolerance 1e-30 >> maximum iterations=10000, nonzero initial guess >> tolerances: relative=1., absolute=1e-14, divergence=10000. >> right preconditioning >> using UNPRECONDITIONED norm type for convergence test >> PC Object: 1 MPI process >> type: fieldsplit >> FieldSplit with MULTIPLICATIVE composition: total splits = 4 >> Solver info for each split is in the following KSP objects: >> Split number 0 Defined by IS >> KSP Object: (fieldsplit_0_) 1 MPI process >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using DEFAULT norm type for convergence test >> PC Object: (fieldsplit_0_) 1 MPI process >> type: ilu >> PC has not been set up so information may be incomplete >> out-of-place factorization >> 0 levels of fill >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: natural >> matrix solver type: petsc >> matrix not yet factored; no additional information available >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) 1 MPI process >> type: seqaij >> rows=615, cols=615 >> total: nonzeros=9213, allocated nonzeros=9213 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Split number 1 Defined by IS >> KSP Object: (fieldsplit_1_) 1 MPI process >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using DEFAULT norm type for convergence test >> PC Object: (fieldsplit_1_) 1 MPI process >> type: ilu >> PC has not been set up so information may be incomplete >> out-of-place factorization >> 0 levels of fill >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: natural >> matrix solver type: petsc >> matrix not yet factored; no additional information available >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_1_) 1 MPI process >> type: seqaij >> rows=64, cols=64 >> total: nonzeros=0, allocated nonzeros=0 >> total number of mallocs used during MatSetValues calls=0 >> using I-node routines: found 13 nodes, limit used is 5 >> Split number 2 Defined by IS >> KSP Object: (fieldsplit_2_) 1 MPI process >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using DEFAULT norm type for convergence test >> PC Object: (fieldsplit_2_) 1 MPI process >> type: ilu >> PC has not been set up so information may be incomplete >> out-of-place factorization >> 0 levels of fill >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: natural >> matrix solver type: petsc >> matrix not yet factored; no additional information available >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_2_) 1 MPI process >> type: seqaij >> rows=240, cols=240 >> total: nonzeros=2140, allocated nonzeros=2140 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Split number 3 Defined by IS >> KSP Object: (fieldsplit_3_) 1 MPI process >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using DEFAULT norm type for convergence test >> PC Object: (fieldsplit_3_) 1 MPI process >> type: ilu >> PC has not been set up so information may be incomplete >> out-of-place factorization >> 0 levels of fill >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: natural >> matrix solver type: petsc >> matrix not yet factored; no additional information available >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_3_) 1 MPI process >> type: seqaij >> rows=300, cols=300 >> total: nonzeros=2292, allocated nonzeros=2292 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=1219, cols=1219 >> total: nonzeros=26443, allocated nonzeros=26443 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> solving time: 0.00449609 >> iterations: 0 >> estimated error: 25.4142 >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Mon Feb 6 11:44:37 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Mon, 06 Feb 2023 18:44:37 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= In-Reply-To: Message-ID: <1381a1-63e13c80-cf-417f7600@172794512> Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 6 13:05:53 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 6 Feb 2023 14:05:53 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <1381a1-63e13c80-cf-417f7600@172794512> References: <1381a1-63e13c80-cf-417f7600@172794512> Message-ID: <57A3624B-DAAA-4D4F-9EA0-7F4BEED7C9C4@petsc.dev> It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you. I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver. Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance. > On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote: > > Hi Barry, > > src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: > > $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 > =========================================== > Test: KSP performance - Poisson > Input matrix: 27-pt finite difference stencil > -n 100 > DoFs = 1000000 > Number of nonzeros = 26463592 > > Step1 - creating Vecs and Mat... > Step2a - running PCSetUp()... > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ > [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 59. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > > As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. > I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. > > Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? > > Thanks, > Paul > > On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: > >> >> > > Paul, > > I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking. > > You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU. > > Barry > > >> >> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote: >> >> Hi, >> >> I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). >> Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. >> For now I am using -log_view with added stages: >> >> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >> for (int i = 0; i < BENCH_SOLVES; i++) { >> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x >> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done >> >> PetscLogStagePush(solve_bench_stage); >> PetscCall(KSPSolve(ksp, NULL, NULL)); >> PetscLogStagePop(); >> } >> >> This snippet is preceded by a similar loop for warmup. >> >> When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs , but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. >> >> Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? >> >> Thanks, >> Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 6 14:17:38 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 6 Feb 2023 15:17:38 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> Message-ID: <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. Barry > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Thank you very much for offering to debug. > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. > > I have attached three files: > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. > > problem.c ? A simple stand-alone petsc code, which produces almost the same error message. > > petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? > > Please use problem.c to debug the issue. > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Saturday, 4 February 2023 at 00:22 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you can help me reproduce the problem with a simple code I can debug the problem and fix it. > > Barry > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > I updated the main branch to the below commit but the same problem persists. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 > > > From: Barry Smith > > Date: Friday, 3 February 2023 at 18:51 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. > > Barry > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Mon Feb 6 15:45:06 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Mon, 6 Feb 2023 21:45:06 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> Message-ID: No problem. I don?t completely follow. (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Best, Karthik. From: Barry Smith Date: Monday, 6 February 2023 at 20:18 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. Barry On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI wrote: Thank you very much for offering to debug. I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. I have attached three files: petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. problem.c ? A simple stand-alone petsc code, which produces almost the same error message. petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? Please use problem.c to debug the issue. Kind regards, Karthik. From: Barry Smith > Date: Saturday, 4 February 2023 at 00:22 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith > Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 6 15:52:06 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 6 Feb 2023 16:52:06 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> Message-ID: On Mon, Feb 6, 2023 at 4:45 PM Karthikeyan Chockalingam - STFC UKRI via petsc-users wrote: > No problem. I don?t completely follow. > > > > (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what > it does? And its relevance to my problem. > Barry means MatZeroRows(), MatZeroRowsColumns(), etc. > (Q2) Since I am creating a MATMPIAJI system? what would be the best way to > insert 0.0 in to ALL diagonals (both active and inactive rows) to begin > with? > I would just write a loop to SetValues on (i,i). > (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after > I have put values into the matrix would be an effort. Unless there is a > straight forward to do it in PETSc. > You can just do it for all rows. > (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve > would eliminate those rows? > Only REDISTRIBUTE will eliminate zero rows. Thanks, Matt > > > Best, > > Karthik. > > > > *From: *Barry Smith > *Date: *Monday, 6 February 2023 at 20:18 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports > completely empty rows but MatZero* does not. > > > > When you put values into the matrix you will need to insert a 0.0 on the > diagonal of each "inactive" row; all of this will be eliminated during the > linear solve process. It would be a major project to change the MatZero* > functions to handle empty rows. > > > > Barry > > > > > > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you very much for offering to debug. > > > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone > which would reproduce the same error on the smallest sized problem possible. > > > > I have attached three files: > > > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc > interface, but THE linear system solves and converges to a solution. > > > > problem.c ? A simple stand-alone petsc code, which produces almost the > same error message. > > > > petsc_ error_redistribute.txt ? The error message from problem.c but > strangely it does NOT solve ? I am not sure why? > > > > Please use problem.c to debug the issue. > > > > Kind regards, > > Karthik. > > > > > > *From: *Barry Smith > *Date: *Saturday, 4 February 2023 at 00:22 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > If you can help me reproduce the problem with a simple code I can debug > the problem and fix it. > > > > Barry > > > > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > I updated the main branch to the below commit but the same problem > persists. > > > > *[0]PETSC ERROR: Petsc Development GIT revision: > v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000* > > > > > > *From: *Barry Smith > *Date: *Friday, 3 February 2023 at 18:51 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > If you switch to use the main branch of petsc > https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you > will not have the problem below (previously we required that a row exist > before we zeroed it but now we allow the row to initially have no entries > and still be zeroed. > > > > Barry > > > > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you. The entire error output was an attachment in my previous email. > I am pasting here for your reference. > > > > > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > > [0]PETSC ERROR: Option left: name:-options_left (no value) > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda > GIT Date: 2022-10-30 11:03:25 -0500 > > [0]PETSC ERROR: > /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named > HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > > [0]PETSC ERROR: Configure options --with-debugging=0 > --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc > --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes > --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > > [0]PETSC ERROR: #2 MatZeroRowsColumns() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > > [0]PETSC ERROR: #4 MatZeroRowsColumns() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > [0]PETSC ERROR: #6 localAssembly() at > /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > > Residual norms for redistribute_ solve. > > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm > 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm > 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm > 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm > 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm > 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm > 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm > 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm > 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm > 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm > 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm > 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm > 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > > KSP Object: (redistribute_) 1 MPI process > > type: cg > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (redistribute_) 1 MPI process > > type: jacobi > > type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > type: mpiaij > > rows=48896, cols=48896 > > total: nonzeros=307976, allocated nonzeros=307976 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node (on process 0) routines > > End of program > > solve time 0.564714744 seconds > > Starting max value is: 0 > > Min value of level 0 is: 0 > > Interpolated min value is: 741.978761 > > Unused ParmParse Variables: > > [TOP]::model.type(nvals = 1) :: [3] > > [TOP]::ref_ratio(nvals = 1) :: [2] > > > > AMReX (22.10-20-g3082028e4287) finalized > > #PETSc Option Table entries: > > -ksp_type preonly > > -options_left > > -pc_type redistribute > > -redistribute_ksp_converged_reason > > -redistribute_ksp_monitor_true_residual > > -redistribute_ksp_type cg > > -redistribute_ksp_view > > -redistribute_pc_type jacobi > > #End of PETSc Option Table entries > > There are no unused options. > > Program ended with exit code: 0 > > > > > > Best, > > Karthik. > > > > *From: *Barry Smith > *Date: *Friday, 3 February 2023 at 17:41 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > We need all the error output for the errors you got below to understand > why the errors are happening. > > > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Hello Barry, > > > > I would like to better understand pc_type redistribute usage. > > > > I am plan to use pc_type *redistribute* in the context of adaptive mesh > refinement on a structured grid in 2D. My base mesh (level 0) is indexed > from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to > 4(N-1) elements. When I construct system matrix A on (level 1); I probably > only use 20% of 4(N-1) elements, however the indexes are scattered in the > range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system > matrix A on (level 1) to be zero. From your earlier response, I believe > this would be a use case for petsc_type redistribute. > > > > Indeed the linear solve will be more efficient if you use the > redistribute solver. > > > > But I don't understand your plan. With adaptive refinement I would just > create the two matrices, one for the initial grid on which you solve the > system, this will be a smaller matrix and then create a new larger matrix > for the refined grid (and discard the previous matrix). > > > > > > > > > Question (1) > > > > > > If N is really large, I would have to allocate memory of size 4(N-1) for > the system matrix A on (level 1). How does pc_type redistribute help? > Because, I did end up allocating memory for a large system, where most of > the rows and columns are zeros. Is most of the allotted memory not wasted? > *Is this the correct usage?* > > > > See above > > > > > > > > > Question (2) > > > > > > I tried using pc_type redistribute for a two level system. > > I have *attached* the output only for (level 1) > > The solution converges to right solution but still petsc outputs some > error messages. > > > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > > [0]PETSC ERROR: Option left: name:-options_left (no value) > > > > But the there were no unused options > > > > #PETSc Option Table entries: > > -ksp_type preonly > > -options_left > > -pc_type redistribute > > -redistribute_ksp_converged_reason > > -redistribute_ksp_monitor_true_residual > > -redistribute_ksp_type cg > > -redistribute_ksp_view > > -redistribute_pc_type jacobi > > #End of PETSc Option Table entries > > *There are no unused options.* > > Program ended with exit code: 0 > > > > I cannot explain this > > > > > > Question (2) > > > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > > > What does this error message imply? Given I only use 20% of 4(N-1) > indexes, I can imagine most of the diagonal entrees are zero. *Is my > understanding correct?* > > > > > > Question (3) > > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. > I don?t follow why I get this error message as the linear system converges > to the right solution. > > > > Thank you for your help. > > > > Kind regards, > > Karthik. > > > > > > > > *From: *Barry Smith > *Date: *Tuesday, 10 January 2023 at 18:50 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > Yes, after the solve the x will contain correct values for ALL the > locations including the (zeroed out rows). You use case is exactly what > redistribute it for. > > > > Barry > > > > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you Barry. This is great! > > > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet > bc using > > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > > > While I retrieve the solution data from x (after the solve) ? can I index > them using the original ordering (if I may say that)? > > > > Kind regards, > > Karthik. > > > > *From: *Barry Smith > *Date: *Tuesday, 10 January 2023 at 16:04 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute > -pc_type redistribute > > > > > > It does everything for you. Note that if the right hand side for any of > the "zero" rows is nonzero then the system is inconsistent and the system > does not have a solution. > > > > Barry > > > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via > petsc-users wrote: > > > > Hello, > > > > I am assembling a MATIJ of size N, where a very large number of rows (and > corresponding columns), are zeros. I would like to potentially eliminate > them before the solve. > > > > For instance say N=7 > > > > 0 0 0 0 0 0 0 > > 0 1 -1 0 0 0 0 > > 0 -1 2 0 0 0 -1 > > 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 > > 0 0 -1 0 0 0 1 > > > > I would like to reduce it to a 3x3 > > > > 1 -1 0 > > -1 2 -1 > > 0 -1 1 > > > > I do know the size N. > > > > Q1) How do I do it? > > Q2) Is it better to eliminate them as it would save a lot of memory? > > Q3) At the moment, I don?t know which rows (and columns) have the zero > entries but with some effort I probably can find them. Should I know which > rows (and columns) I am eliminating? > > > > Thank you. > > > > Karthik. > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not use, > disclose, copy or distribute this email or any of its attachments and > should notify the sender immediately and delete this email from your > system. UK Research and Innovation (UKRI) has taken every reasonable > precaution to minimise risk of this email or any attachments containing > viruses or malware but the recipient should carry out its own virus and > malware checks before opening the attachments. UKRI does not accept any > liability for any losses or damages which the recipient may sustain due to > presence of any viruses. > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Mon Feb 6 16:14:48 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Mon, 6 Feb 2023 22:14:48 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> Message-ID: Thank you Matt. (Q1) I believe, I will look for a range of row indexes local to my process, instead of having all my processes setting diagonals to zero using a loop. (Q2) You are referring to -pc_type redistribute correct ? if something please send me the documentation page? Many thanks! Karthik. From: Matthew Knepley Date: Monday, 6 February 2023 at 21:52 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: Barry Smith , petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Mon, Feb 6, 2023 at 4:45 PM Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: No problem. I don?t completely follow. (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. Barry means MatZeroRows(), MatZeroRowsColumns(), etc. (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? I would just write a loop to SetValues on (i,i). (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. You can just do it for all rows. (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Only REDISTRIBUTE will eliminate zero rows. Thanks, Matt Best, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 20:18 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. Barry On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you very much for offering to debug. I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. I have attached three files: petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. problem.c ? A simple stand-alone petsc code, which produces almost the same error message. petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? Please use problem.c to debug the issue. Kind regards, Karthik. From: Barry Smith > Date: Saturday, 4 February 2023 at 00:22 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith > Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 6 16:42:17 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 6 Feb 2023 17:42:17 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> Message-ID: <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() > On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > No problem. I don?t completely follow. > > (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. > > (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. > > > (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. > > (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. > > Best, > Karthik. > > From: Barry Smith > Date: Monday, 6 February 2023 at 20:18 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. > > When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. > > Barry > > > > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Thank you very much for offering to debug. > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. > > I have attached three files: > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. > > problem.c ? A simple stand-alone petsc code, which produces almost the same error message. > > petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? > > Please use problem.c to debug the issue. > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Saturday, 4 February 2023 at 00:22 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you can help me reproduce the problem with a simple code I can debug the problem and fix it. > > Barry > > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > I updated the main branch to the below commit but the same problem persists. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 > > > From: Barry Smith > > Date: Friday, 3 February 2023 at 18:51 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. > > Barry > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 6 16:42:17 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 6 Feb 2023 17:42:17 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> Message-ID: <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() > On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > No problem. I don?t completely follow. > > (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. > > (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. > > > (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. > > (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. > > Best, > Karthik. > > From: Barry Smith > Date: Monday, 6 February 2023 at 20:18 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. > > When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. > > Barry > > > > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Thank you very much for offering to debug. > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. > > I have attached three files: > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. > > problem.c ? A simple stand-alone petsc code, which produces almost the same error message. > > petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? > > Please use problem.c to debug the issue. > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Saturday, 4 February 2023 at 00:22 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you can help me reproduce the problem with a simple code I can debug the problem and fix it. > > Barry > > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > I updated the main branch to the below commit but the same problem persists. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 > > > From: Barry Smith > > Date: Friday, 3 February 2023 at 18:51 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. > > Barry > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Feb 7 05:23:14 2023 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 7 Feb 2023 06:23:14 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <57A3624B-DAAA-4D4F-9EA0-7F4BEED7C9C4@petsc.dev> References: <1381a1-63e13c80-cf-417f7600@172794512> <57A3624B-DAAA-4D4F-9EA0-7F4BEED7C9C4@petsc.dev> Message-ID: I do one complete solve to get everything setup, to be safe. src/ts/tutorials/ex13.c does this and runs multiple solves, if you like but one solve is probably fine. This was designed as a benchmark and is nice because it can do any order FE solve of Poisson (uses DM/PetscFE, slow). src/ksp/ksp/tutorials/ex56.c is old school, hardwired for elasticity but is simpler and the setup is faster if you are doing large problems per MPI process. Mark On Mon, Feb 6, 2023 at 2:06 PM Barry Smith wrote: > > It should not crash, take a look at the test cases at the bottom of the > file. You are likely correct if the code, unfortunately, does use > DMCreateMatrix() it will not work out of the box for geometric multigrid. > So it might be the wrong example for you. > > I don't know what you mean about clever. If you simply set the solution > to zero at the beginning of the loop then it will just do the same solve > multiple times. The setup should not do much of anything after the first > solver. Thought usually solves are big enough that one need not run solves > multiple times to get a good understanding of their performance. > > > > > > > On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi Barry, > > src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting > point, thank you! Sadly I get a segfault when executing that example with > PCMG and more than one level, i.e. with the minimal args: > > $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 > =========================================== > Test: KSP performance - Poisson > Input matrix: 27-pt finite difference stencil > -n 100 > DoFs = 1000000 > Number of nonzeros = 26463592 > > Step1 - creating Vecs and Mat... > Step2a - running PCSetUp()... > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [0]PETSC ERROR: or try > https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA > systems to find memory corruption errors > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is > causing the crash. > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 59. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > > As the matrix is not created using DMDACreate3d I expected it to fail due > to the missing geometric information, but I expected it to fail more > gracefully than with a segfault. > I will try to combine bench_kspsolve.c with ex45.c to get easy MG > preconditioning, especially since I am interested in the 7pt stencil for > now. > > Concerning my benchmarking loop from before: Is it generally discouraged > to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the > work when doing the same solve multiple times or are the solves not > iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to > keep the runtime short? > > Thanks, > Paul > > On Monday, February 06, 2023 17:04 CET, Barry Smith > wrote: > > > > > > Paul, > > I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to > be used for simple benchmarking. > > You can use VecCudaGetArray() to access the GPU memory of the vector > and then call a CUDA kernel to compute the right hand side vector directly > on the GPU. > > Barry > > > > On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi, > > I want to compare different implementations of multigrid solvers for > Nvidia GPUs using the poisson problem (starting from ksp tutorial example > ex45.c). > Therefore I am trying to get runtime results comparable to hpgmg-cuda > > (finite-volume), i.e. using multiple warmup and measurement solves and > avoiding measuring setup time. > For now I am using -log_view with added stages: > > PetscLogStageRegister("Solve Bench", &solve_bench_stage); > for (int i = 0; i < BENCH_SOLVES; i++) { > PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); > // reset x > PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve > PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that > everything is done > > PetscLogStagePush(solve_bench_stage); > PetscCall(KSPSolve(ksp, NULL, NULL)); > PetscLogStagePop(); > } > > This snippet is preceded by a similar loop for warmup. > > When profiling this using Nsight Systems, I see that the very first solve > is much slower which mostly correspods to H2D (host to device) copies and > e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs > , > but probably insignificant in this case). The following solves have some > overhead at the start from a H2D copy of a vector (the RHS I guess, as the > copy is preceeded by a matrix-vector product) in the first MatResidual call > (callchain: > KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My > interpretation of the profiling results (i.e. cuBLAS calls) is that that > vector is overwritten with the residual in Daxpy and therefore has to be > copied again for the next iteration. > > Is there an elegant way of avoiding that H2D copy? I have seen some > examples on constructing matrices directly on the GPU, but nothing about > vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At > the moment I am using jacobi as smoother, but I would like to have a CUDA > implementation of SOR instead. Is there a good way of achieving that, e.g. > using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as > smoother in PCMG? Or is the overhead from constantly switching between > PETSc and hypre too big? > > Thanks, > Paul > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 7 05:40:24 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 7 Feb 2023 06:40:24 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: References: <1381a1-63e13c80-cf-417f7600@172794512> <57A3624B-DAAA-4D4F-9EA0-7F4BEED7C9C4@petsc.dev> Message-ID: On Tue, Feb 7, 2023 at 6:23 AM Mark Adams wrote: > I do one complete solve to get everything setup, to be safe. > > src/ts/tutorials/ex13.c does this and runs multiple solves, if you like > but one solve is probably fine. > I think that is SNES ex13 Matt > This was designed as a benchmark and is nice because it can do any order > FE solve of Poisson (uses DM/PetscFE, slow). > src/ksp/ksp/tutorials/ex56.c is old school, hardwired for elasticity but > is simpler and the setup is faster if you are doing large problems per MPI > process. > > Mark > > On Mon, Feb 6, 2023 at 2:06 PM Barry Smith wrote: > >> >> It should not crash, take a look at the test cases at the bottom of the >> file. You are likely correct if the code, unfortunately, does use >> DMCreateMatrix() it will not work out of the box for geometric multigrid. >> So it might be the wrong example for you. >> >> I don't know what you mean about clever. If you simply set the solution >> to zero at the beginning of the loop then it will just do the same solve >> multiple times. The setup should not do much of anything after the first >> solver. Thought usually solves are big enough that one need not run solves >> multiple times to get a good understanding of their performance. >> >> >> >> >> >> >> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley < >> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >> >> Hi Barry, >> >> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting >> point, thank you! Sadly I get a segfault when executing that example with >> PCMG and more than one level, i.e. with the minimal args: >> >> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >> =========================================== >> Test: KSP performance - Poisson >> Input matrix: 27-pt finite difference stencil >> -n 100 >> DoFs = 1000000 >> Number of nonzeros = 26463592 >> >> Step1 - creating Vecs and Mat... >> Step2a - running PCSetUp()... >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and >> https://petsc.org/release/faq/ >> [0]PETSC ERROR: or try >> https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA >> systems to find memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is >> causing the crash. >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >> with errorcode 59. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> >> As the matrix is not created using DMDACreate3d I expected it to fail due >> to the missing geometric information, but I expected it to fail more >> gracefully than with a segfault. >> I will try to combine bench_kspsolve.c with ex45.c to get easy MG >> preconditioning, especially since I am interested in the 7pt stencil for >> now. >> >> Concerning my benchmarking loop from before: Is it generally discouraged >> to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the >> work when doing the same solve multiple times or are the solves not >> iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to >> keep the runtime short? >> >> Thanks, >> Paul >> >> On Monday, February 06, 2023 17:04 CET, Barry Smith >> wrote: >> >> >> >> >> >> Paul, >> >> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to >> be used for simple benchmarking. >> >> You can use VecCudaGetArray() to access the GPU memory of the vector >> and then call a CUDA kernel to compute the right hand side vector directly >> on the GPU. >> >> Barry >> >> >> >> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley < >> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >> >> Hi, >> >> I want to compare different implementations of multigrid solvers for >> Nvidia GPUs using the poisson problem (starting from ksp tutorial example >> ex45.c). >> Therefore I am trying to get runtime results comparable to hpgmg-cuda >> >> (finite-volume), i.e. using multiple warmup and measurement solves and >> avoiding measuring setup time. >> For now I am using -log_view with added stages: >> >> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >> for (int i = 0; i < BENCH_SOLVES; i++) { >> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); >> // reset x >> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that >> everything is done >> >> PetscLogStagePush(solve_bench_stage); >> PetscCall(KSPSolve(ksp, NULL, NULL)); >> PetscLogStagePop(); >> } >> >> This snippet is preceded by a similar loop for warmup. >> >> When profiling this using Nsight Systems, I see that the very first solve >> is much slower which mostly correspods to H2D (host to device) copies and >> e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs >> , >> but probably insignificant in this case). The following solves have some >> overhead at the start from a H2D copy of a vector (the RHS I guess, as the >> copy is preceeded by a matrix-vector product) in the first MatResidual call >> (callchain: >> KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My >> interpretation of the profiling results (i.e. cuBLAS calls) is that that >> vector is overwritten with the residual in Daxpy and therefore has to be >> copied again for the next iteration. >> >> Is there an elegant way of avoiding that H2D copy? I have seen some >> examples on constructing matrices directly on the GPU, but nothing about >> vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At >> the moment I am using jacobi as smoother, but I would like to have a CUDA >> implementation of SOR instead. Is there a good way of achieving that, e.g. >> using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as >> smoother in PCMG? Or is the overhead from constantly switching between >> PETSc and hypre too big? >> >> Thanks, >> Paul >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Tue Feb 7 12:20:14 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Tue, 7 Feb 2023 18:20:14 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> Message-ID: Thank you Barry for your detailed response. I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). The number of elements on each level is a geometric progression. N - Number elements on each level indexed by ?n? n - Adaptive mesh level index (starting from 1) a - Number of elements on the base mesh = 16 r = 4 (each element is divided into four on the next higher level of refinement) N = a r^(n-1) The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). u = u^(1) + u^(2) + ? + u^(7) Hence I have seven system matrix and solution vector pairs, one for each level. On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. (Q2) After adding 0.0 to all diagonal. I able to solve using either ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); or ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); I was able to solve using hypre as well. Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? Kind regards, Karthik. From: Barry Smith Date: Monday, 6 February 2023 at 22:42 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI wrote: No problem. I don?t completely follow. (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. Best, Karthik. From: Barry Smith Date: Monday, 6 February 2023 at 20:18 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. Barry On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI wrote: Thank you very much for offering to debug. I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. I have attached three files: petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. problem.c ? A simple stand-alone petsc code, which produces almost the same error message. petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? Please use problem.c to debug the issue. Kind regards, Karthik. From: Barry Smith > Date: Saturday, 4 February 2023 at 00:22 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith > Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: adaptive_mesh_level.png Type: image/png Size: 366379 bytes Desc: adaptive_mesh_level.png URL: From bsmith at petsc.dev Tue Feb 7 13:52:00 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 7 Feb 2023 14:52:00 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> Message-ID: <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> > On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Thank you Barry for your detailed response. > > I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). > > The number of elements on each level is a geometric progression. > N - Number elements on each level indexed by ?n? > n - Adaptive mesh level index (starting from 1) > a - Number of elements on the base mesh = 16 > r = 4 (each element is divided into four on the next higher level of refinement) > > N = a r^(n-1) > > The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). > > u = u^(1) + u^(2) + ? + u^(7) > > Hence I have seven system matrix and solution vector pairs, one for each level. > > On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. > As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. > > (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. > > (Q2) After adding 0.0 to all diagonal. I able to solve using either > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); > or > ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); > I was able to solve using hypre as well. > > Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. No you do not need redistribute, but for large problems with many empty rows using a solver inside redistribute will be faster than just using that solver directly on the much larger (mostly empty) system. > > (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... N,NULL); where N is the number of columns in the matrix? If so, yes this causes a huge malloc() by PETSc when it allocates the matrix. It is not scalable. Do you have a small upper bound on the number of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect but much better than N. Barry > as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? > > Kind regards, > Karthik. > > > > > From: Barry Smith > > Date: Monday, 6 February 2023 at 22:42 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() > > > On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > No problem. I don?t completely follow. > > (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. > > (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? > > Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to > put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. > > (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. > > (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? > > Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. > > When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. > > > > > Best, > Karthik. > > From: Barry Smith > > Date: Monday, 6 February 2023 at 20:18 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. > > When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. > > Barry > > > > > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you very much for offering to debug. > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. > > I have attached three files: > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. > > problem.c ? A simple stand-alone petsc code, which produces almost the same error message. > > petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? > > Please use problem.c to debug the issue. > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Saturday, 4 February 2023 at 00:22 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you can help me reproduce the problem with a simple code I can debug the problem and fix it. > > Barry > > > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > I updated the main branch to the below commit but the same problem persists. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 > > > From: Barry Smith > > Date: Friday, 3 February 2023 at 18:51 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. > > Barry > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > > > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Feb 7 15:26:03 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 7 Feb 2023 15:26:03 -0600 Subject: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device. In-Reply-To: References: Message-ID: Hi, Philip, I believe this MR https://gitlab.com/petsc/petsc/-/merge_requests/6030 would fix the problem. It is a fix to petsc/release, but you can cherry-pick it to petsc/main. Could you try that in your case? Thanks. --Junchao Zhang On Fri, Jan 20, 2023 at 11:31 AM Junchao Zhang wrote: > Sorry, no progress. I guess that is because a vector was gotten but not > restored (e.g., VecRestoreArray() etc), causing host and device data not > synced. Maybe in your code, or in petsc code. > After the ECP AM, I will have more time on this bug. > Thanks. > > --Junchao Zhang > > > On Fri, Jan 20, 2023 at 11:00 AM Fackler, Philip > wrote: > >> Any progress on this? Any info/help needed? >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Fackler, Philip >> *Sent:* Thursday, December 8, 2022 09:07 >> *To:* Junchao Zhang >> *Cc:* xolotl-psi-development at lists.sourceforge.net < >> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < >> petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, >> Philip >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Great! Thank you! >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang >> *Sent:* Wednesday, December 7, 2022 18:47 >> *To:* Fackler, Philip >> *Cc:* xolotl-psi-development at lists.sourceforge.net < >> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < >> petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, >> Philip >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Hi, Philip, >> I could reproduce the error. I need to find a way to debug it. Thanks. >> >> /home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in >> "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds >> 1e-10 >> *** 1 failure is detected in the test module "Regression" >> >> >> --Junchao Zhang >> >> >> On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip >> wrote: >> >> I think it would be simpler to use the develop branch for this issue. But >> you can still just build the SystemTester. Then (if you changed the PSI_1 >> case) run: >> >> ./test/system/SystemTester -t System/PSI_1 -- -v? >> >> (No need for multiple MPI ranks) >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang >> *Sent:* Monday, December 5, 2022 15:40 >> *To:* Fackler, Philip >> *Cc:* xolotl-psi-development at lists.sourceforge.net < >> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < >> petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, >> Philip >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> I configured with xolotl branch feature-petsc-kokkos, and typed `make` >> under ~/xolotl-build/. Though there were errors, a lot of *Tester were >> built. >> >> [ 62%] Built target xolotlViz >> [ 63%] Linking CXX executable TemperatureProfileHandlerTester >> [ 64%] Linking CXX executable TemperatureGradientHandlerTester >> [ 64%] Built target TemperatureProfileHandlerTester >> [ 64%] Built target TemperatureConstantHandlerTester >> [ 64%] Built target TemperatureGradientHandlerTester >> [ 65%] Linking CXX executable HeatEquationHandlerTester >> [ 65%] Built target HeatEquationHandlerTester >> [ 66%] Linking CXX executable FeFitFluxHandlerTester >> [ 66%] Linking CXX executable W111FitFluxHandlerTester >> [ 67%] Linking CXX executable FuelFitFluxHandlerTester >> [ 67%] Linking CXX executable W211FitFluxHandlerTester >> >> Which Tester should I use to run with the parameter file >> benchmarks/params_system_PSI_2.txt? And how many ranks should I use? >> Could you give an example command line? >> Thanks. >> >> --Junchao Zhang >> >> >> On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang >> wrote: >> >> Hello, Philip, >> Do I still need to use the feature-petsc-kokkos branch? >> --Junchao Zhang >> >> >> On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip >> wrote: >> >> Junchao, >> >> Thank you for working on this. If you open the parameter file for, say, >> the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type >> aijkokkos -dm_vec_type kokkos?` to the "petscArgs=" field (or the >> corresponding cusparse/cuda option). >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang >> *Sent:* Thursday, December 1, 2022 17:05 >> *To:* Fackler, Philip >> *Cc:* xolotl-psi-development at lists.sourceforge.net < >> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < >> petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, >> Philip >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Hi, Philip, >> Sorry for the long delay. I could not get something useful from the >> -log_view output. Since I have already built xolotl, could you give me >> instructions on how to do a xolotl test to reproduce the divergence with >> petsc GPU backends (but fine on CPU)? >> Thank you. >> --Junchao Zhang >> >> >> On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip >> wrote: >> >> ------------------------------------------------------------------ PETSc >> Performance Summary: >> ------------------------------------------------------------------ >> >> Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 >> 14:36:46 2022 >> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: >> 2022-10-28 14:39:41 +0000 >> >> Max Max/Min Avg Total >> Time (sec): 6.023e+00 1.000 6.023e+00 >> Objects: 1.020e+02 1.000 1.020e+02 >> Flops: 1.080e+09 1.000 1.080e+09 1.080e+09 >> Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08 >> MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total Count >> %Total Avg %Total Count %Total >> 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 >> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flop: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> AvgLen: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flop in this >> phase >> %M - percent messages in this phase %L - percent message >> lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time >> over all processors) >> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU >> time over all processors) >> CpuToGpu Count: total number of CPU to GPU copies per processor >> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per >> processor) >> GpuToCpu Count: total number of GPU to CPU copies per processor >> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per >> processor) >> GPU %F: percent flops on GPU in this event >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop >> --- Global --- --- Stage ---- Total >> GPU - CpuToGpu - - GpuToCpu - GPU >> >> Max Ratio Max Ratio Max Ratio Mess AvgLen >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> Mflop/s Count Size Count Size %F >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> --------------------------------------- >> >> >> --- Event Stage 0: Main Stage >> >> BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 2 5.14e-03 0 0.00e+00 0 >> >> VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 97100 0 0 0 97100 0 0 0 184 >> -nan 2 5.14e-03 0 0.00e+00 54 >> >> TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan >> -nan 1 3.36e-04 0 0.00e+00 100 >> >> TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 97 >> >> MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602 >> -nan 1 4.80e-03 0 0.00e+00 46 >> >> KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 95 99 0 0 0 95 99 0 0 0 188 >> -nan 1 4.80e-03 0 0.00e+00 53 >> >> SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 97 >> >> SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 100 >> >> PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan >> -nan 1 4.80e-03 0 0.00e+00 19 >> >> KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan >> -nan 0 0.00e+00 0 0.00e+00 0 >> >> >> --- Event Stage 1: Unknown >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> --------------------------------------- >> >> >> Object Type Creations Destructions. Reports information only >> for process 0. >> >> --- Event Stage 0: Main Stage >> >> Container 5 5 >> Distributed Mesh 2 2 >> Index Set 11 11 >> IS L to G Mapping 1 1 >> Star Forest Graph 7 7 >> Discrete System 2 2 >> Weak Form 2 2 >> Vector 49 49 >> TSAdapt 1 1 >> TS 1 1 >> DMTS 1 1 >> SNES 1 1 >> DMSNES 3 3 >> SNESLineSearch 1 1 >> Krylov Solver 4 4 >> DMKSP interface 1 1 >> Matrix 4 4 >> Preconditioner 4 4 >> Viewer 2 1 >> >> --- Event Stage 1: Unknown >> >> >> ======================================================================================================================== >> Average time to get PetscTime(): 3.14e-08 >> #PETSc Option Table entries: >> -log_view >> -log_view_gpu_times >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with 64 bit PetscInt >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 sizeof(PetscInt) 8 >> Configure options: PETSC_DIR=/home/4pf/repos/petsc >> PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx >> --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries >> --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices >> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 >> --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install >> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install >> >> ----------------------------------------- >> Libraries compiled on 2022-11-01 21:01:08 on PC0115427 >> Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35 >> Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install >> Using PETSc arch: >> ----------------------------------------- >> >> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas >> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector >> -fvisibility=hidden -O3 >> ----------------------------------------- >> >> Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include >> -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include >> -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include >> ----------------------------------------- >> >> Using C linker: mpicc >> Using libraries: >> -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib >> -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc >> -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib >> -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib >> -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib >> -L/home/4pf/build/kokkos/cuda/install/lib >> -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 >> -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers >> -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas >> -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl >> ----------------------------------------- >> >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang >> *Sent:* Tuesday, November 15, 2022 13:03 >> *To:* Fackler, Philip >> *Cc:* xolotl-psi-development at lists.sourceforge.net < >> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < >> petsc-users at mcs.anl.gov>; Blondel, Sophie ; Roth, >> Philip >> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and >> Vec diverging when running on CUDA device. >> >> Can you paste -log_view result so I can see what functions are used? >> >> --Junchao Zhang >> >> >> On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip >> wrote: >> >> Yes, most (but not all) of our system test cases fail with the >> kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos >> backend. >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> ------------------------------ >> *From:* Junchao Zhang >> *Sent:* Monday, November 14, 2022 19:34 >> *To:* Fackler, Philip >> *Cc:* xolotl-psi-development at lists.sourceforge.net < >> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov < >> petsc-users at mcs.anl.gov>; Blondel, Sophie ; Zhang, >> Junchao ; Roth, Philip >> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec >> diverging when running on CUDA device. >> >> Hi, Philip, >> Sorry to hear that. It seems you could run the same code on CPUs but >> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it >> right? >> >> --Junchao Zhang >> >> >> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >> This is an issue I've brought up before (and discussed in-person with >> Richard). I wanted to bring it up again because I'm hitting the limits of >> what I know to do, and I need help figuring this out. >> >> The problem can be reproduced using Xolotl's "develop" branch built >> against a petsc build with kokkos and kokkos-kernels enabled. Then, either >> add the relevant kokkos options to the "petscArgs=" line in the system test >> parameter file(s), or just replace the system test parameter files with the >> ones from the "feature-petsc-kokkos" branch. See here the files that >> begin with "params_system_". >> >> Note that those files use the "kokkos" options, but the problem is >> similar using the corresponding cuda/cusparse options. I've already tried >> building kokkos-kernels with no TPLs and got slightly different results, >> but the same problem. >> >> Any help would be appreciated. >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Wed Feb 8 04:09:55 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Wed, 8 Feb 2023 18:09:55 +0800 Subject: [petsc-users] Save images of ksp_monitor and ksp_view_eigenvaluses with user defined names Message-ID: Hi, PETSc group, I was trying to save figures of the residual and eigenvalues with different names but not default names. The default name is used when I use `-draw_save .png`. All images are saved. ``` python test.py -N 16 -test1_ksp_type gmres -test1_pc_type jacobi -test1_ksp_view_eigenvalues draw -test1_ksp_monitor draw::draw_lg -draw_save .png ``` But when I use `-draw_save abc.png`, only the figure of eigenvalues is saved. ``` python test.py -N 16 -test1_ksp_type gmres -test1_pc_type jacobi -test1_ksp_view_eigenvalues draw -test1_ksp_monitor draw::draw_lg -draw_save .png ``` How can I add the command line options, to specify different names for those images? Thanks, Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Wed Feb 8 06:56:23 2023 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Wed, 8 Feb 2023 13:56:23 +0100 Subject: [petsc-users] interpreting data from SNESSolve profiling Message-ID: <5de98255-28b4-c768-c5d0-2595ead76a5c@uninsubria.it> Dear all, ??? I am trying to optimize the nonlinear solvers in a code of mine, but I am having a hard time at interpreting the profiling data from the SNES. In particular, if I run with -snesCorr_snes_lag_jacobian 5 -snesCorr_snes_linesearch_monitor -snesCorr_snes_monitor -snesCorr_snes_linesearch_type basic -snesCorr_snes_view I get, for all timesteps an output like 0 SNES Function norm 2.204257292307e+00 ?1 SNES Function norm 5.156376709750e-03 ?2 SNES Function norm 9.399026338316e-05 ?3 SNES Function norm 1.700505246874e-06 ?4 SNES Function norm 2.938127043559e-08 SNES Object: snesCorr (snesCorr_) 1 MPI process ?type: newtonls ?maximum iterations=50, maximum function evaluations=10000 ?tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 ?total number of linear solver iterations=4 ?total number of function evaluations=5 ?norm schedule ALWAYS ?Jacobian is rebuilt every 5 SNES iterations ?SNESLineSearch Object: (snesCorr_) 1 MPI process ???type: basic ???maxstep=1.000000e+08, minlambda=1.000000e-12 ???tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 ???maximum iterations=40 ?KSP Object: (snesCorr_) 1 MPI process ???type: gmres ?????restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement ?????happy breakdown tolerance 1e-30 ???maximum iterations=10000, initial guess is zero ???tolerances: ?relative=1e-05, absolute=1e-50, divergence=10000. ???left preconditioning ???using PRECONDITIONED norm type for convergence test ?PC Object: (snesCorr_) 1 MPI process ???type: ilu ?????out-of-place factorization ?????0 levels of fill ?????tolerance for zero pivot 2.22045e-14 ?????matrix ordering: natural ?????factor fill ratio given 1., needed 1. ???????Factored matrix follows: ?????????Mat Object: (snesCorr_) 1 MPI process ???????????type: seqaij ???????????rows=1200, cols=1200 ???????????package used to perform factorization: petsc ???????????total: nonzeros=17946, allocated nonzeros=17946 ?????????????using I-node routines: found 400 nodes, limit used is 5 ???linear system matrix = precond matrix: ???Mat Object: 1 MPI process ?????type: seqaij ?????rows=1200, cols=1200 ?????total: nonzeros=17946, allocated nonzeros=17946 ?????total number of mallocs used during MatSetValues calls=0 ???????using I-node routines: found 400 nodes, limit used is 5 I guess that this means that no linesearch is performed and the full Newton step is always performed (I did not report the full output, but all timesteps are alike). Also, with the default (bt) LineSearch, the total CPU time does not change, which seems in line with this. However, I'd have expected that the time spent in SNESLineSearch would be negligible, but the flamegraph is showing that about 38% of the time spent by SNESSolve is actually spent in SNESLineSearch. Furthermore, SNESLineSearch seems to cause more SNESFunction evaluations (in terms of CPU time) than the SNESSolve itself. The flamegraph is attached. Could some expert help me in understanding these data? Is the LineSearch actually performing the newton step? Given that the full step is always taken, can the SNESFunction evaluations from the LineSearch be skipped? Thanks a lot! Matteo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: flame.svg Type: image/svg+xml Size: 35538 bytes Desc: not available URL: From karthikeyan.chockalingam at stfc.ac.uk Wed Feb 8 10:19:33 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Wed, 8 Feb 2023 16:19:33 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> Message-ID: No, I am not calling MatMPIAIJSetPreallocation(... N,NULL); Here is what I do: PetscInt d_nz = 10; PetscInt o_nz = 10; ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); ierr = MatMPIAIJSetPreallocation(A, d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); (Q1) As I am setting the size of A to be N x N via ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); and pre-allocation is done for ALL rows I would like to understand if the ?inactive rows? are NOT contributing to memory (while using ?redistribute?)? (Q2) I tried solving using hypre within redistribute and system converges to a solution. Is below correct way to use hypre within redistribute? ierr = PetscOptionsSetValue(NULL,"-ksp_type", "preonly"); ierr = PetscOptionsSetValue(NULL,"-pc_type", "redistribute"); ierr = PetscOptionsSetValue(NULL,"-redistribute_ksp_type", "cg"); ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "hypre"); ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_hypre_type", "boomeramg"); Many thanks, Karthik. From: Barry Smith Date: Tuesday, 7 February 2023 at 19:52 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI wrote: Thank you Barry for your detailed response. I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). The number of elements on each level is a geometric progression. N - Number elements on each level indexed by ?n? n - Adaptive mesh level index (starting from 1) a - Number of elements on the base mesh = 16 r = 4 (each element is divided into four on the next higher level of refinement) N = a r^(n-1) The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). u = u^(1) + u^(2) + ? + u^(7) Hence I have seven system matrix and solution vector pairs, one for each level. On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. (Q2) After adding 0.0 to all diagonal. I able to solve using either ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); or ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); I was able to solve using hypre as well. Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. No you do not need redistribute, but for large problems with many empty rows using a solver inside redistribute will be faster than just using that solver directly on the much larger (mostly empty) system. (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... N,NULL); where N is the number of columns in the matrix? If so, yes this causes a huge malloc() by PETSc when it allocates the matrix. It is not scalable. Do you have a small upper bound on the number of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect but much better than N. Barry as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? Kind regards, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 22:42 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: No problem. I don?t completely follow. (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. Best, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 20:18 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. Barry On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you very much for offering to debug. I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. I have attached three files: petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. problem.c ? A simple stand-alone petsc code, which produces almost the same error message. petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? Please use problem.c to debug the issue. Kind regards, Karthik. From: Barry Smith > Date: Saturday, 4 February 2023 at 00:22 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith > Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 8 15:09:38 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 8 Feb 2023 16:09:38 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> Message-ID: > On Feb 8, 2023, at 11:19 AM, Karthikeyan Chockalingam - STFC UKRI wrote: > > No, I am not calling MatMPIAIJSetPreallocation(... N,NULL); > Here is what I do: > > PetscInt d_nz = 10; > PetscInt o_nz = 10; > > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); > ierr = MatMPIAIJSetPreallocation(A, d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); > > (Q1) > As I am setting the size of A to be N x N via > > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); d_nz and o_nz determine the memory reserved for A. So if you use for example 10 that means 10 entries are reserved for every row, even inactive ones. > > and pre-allocation is done for ALL rows I would like to understand if the ?inactive rows? are NOT contributing to memory (while using ?redistribute?)? Redistribute will drop all the inactive rows from the computation of the solve, it generates a smaller matrix missing all those rows and columns. But the original big matrix with lots of extra reserved space still exists (your A). > > (Q2) > > I tried solving using hypre within redistribute and system converges to a solution. Is below correct way to use hypre within redistribute? > > ierr = PetscOptionsSetValue(NULL,"-ksp_type", "preonly"); > ierr = PetscOptionsSetValue(NULL,"-pc_type", "redistribute"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_ksp_type", "cg"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "hypre"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_hypre_type", "boomeramg"); Correct. You can run with -ksp_view and it will provide all the information about how the solves are layered. > > Many thanks, > > Karthik. > > From: Barry Smith > > Date: Tuesday, 7 February 2023 at 19:52 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry for your detailed response. > > I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). > > The number of elements on each level is a geometric progression. > N - Number elements on each level indexed by ?n? > n - Adaptive mesh level index (starting from 1) > a - Number of elements on the base mesh = 16 > r = 4 (each element is divided into four on the next higher level of refinement) > > N = a r^(n-1) > > The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). > > u = u^(1) + u^(2) + ? + u^(7) > > Hence I have seven system matrix and solution vector pairs, one for each level. > > On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. > As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. > > (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. > > (Q2) After adding 0.0 to all diagonal. I able to solve using either > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); > or > ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); > I was able to solve using hypre as well. > > Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. > > No you do not need redistribute, but for large problems with many empty rows using a solver inside redistribute will be faster than just using that solver directly on the much larger (mostly empty) system. > > > > (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] > > Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... N,NULL); where N is the number of columns in the matrix? > > If so, yes this causes a huge malloc() by PETSc when it allocates the matrix. It is not scalable. Do you have a small upper bound on the number of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect but much better than N. > > Barry > > > > > > as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? > > Kind regards, > Karthik. > > > > > From: Barry Smith > > Date: Monday, 6 February 2023 at 22:42 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() > > > > On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > No problem. I don?t completely follow. > > (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. > > (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? > > Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to > put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. > > (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. > > (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? > > Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. > > When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. > > > > > > Best, > Karthik. > > From: Barry Smith > > Date: Monday, 6 February 2023 at 20:18 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. > > When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. > > Barry > > > > > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you very much for offering to debug. > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. > > I have attached three files: > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. > > problem.c ? A simple stand-alone petsc code, which produces almost the same error message. > > petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? > > Please use problem.c to debug the issue. > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Saturday, 4 February 2023 at 00:22 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you can help me reproduce the problem with a simple code I can debug the problem and fix it. > > Barry > > > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > I updated the main branch to the below commit but the same problem persists. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 > > > From: Barry Smith > > Date: Friday, 3 February 2023 at 18:51 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. > > Barry > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > > > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 8 15:35:25 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 8 Feb 2023 16:35:25 -0500 Subject: [petsc-users] Save images of ksp_monitor and ksp_view_eigenvaluses with user defined names In-Reply-To: References: Message-ID: <3969E573-F5BA-4E90-863A-E349036BF06B@petsc.dev> It cannot be done using the default X windows monitor and -draw_save because there is no way to distinguish the files for each sub window images. However, there is an alternative. -ksp_view_eigenvalues draw:image:jeff.ppm -viewer_view -ksp_monitor_solution draw:image:joe.ppm This alternative only supports .ppm files (so you may need to call a converter on the result) and does put each image in a separate file in its own named directory, for example, joe/joe_0.ppm but at least it allows you to have different named files. Of course you can also just run your code twice with two different options. Unfortunately there is a bug in the KSP eigenmonitor viewing that I had to fix to get this to work so you'll need to checkout the barry/2023-02-08/fix-ksp-monitor-eigenvalues-draw branch of PETSc to use the option I suggest. Barry > On Feb 8, 2023, at 5:09 AM, Zongze Yang wrote: > > Hi, PETSc group, > > I was trying to save figures of the residual and eigenvalues with different names but not default names. > > The default name is used when I use `-draw_save .png`. All images are saved. > ``` > python test.py -N 16 -test1_ksp_type gmres -test1_pc_type jacobi -test1_ksp_view_eigenvalues draw -test1_ksp_monitor draw::draw_lg -draw_save .png > ``` > But when I use `-draw_save abc.png`, only the figure of eigenvalues is saved. > ``` > python test.py -N 16 -test1_ksp_type gmres -test1_pc_type jacobi -test1_ksp_view_eigenvalues draw -test1_ksp_monitor draw::draw_lg -draw_save .png > ``` > > How can I add the command line options, to specify different names for those images? > > Thanks, > Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 8 16:00:05 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 8 Feb 2023 17:00:05 -0500 Subject: [petsc-users] interpreting data from SNESSolve profiling In-Reply-To: <5de98255-28b4-c768-c5d0-2595ead76a5c@uninsubria.it> References: <5de98255-28b4-c768-c5d0-2595ead76a5c@uninsubria.it> Message-ID: <2ECB7E17-13E6-4DFF-90A2-AAF1FEF71D46@petsc.dev> Yes, the "basic" line search updates the solution vector and evaluates the function at the new solution point; not much else. There are no duplicate or extra function evaluations in PETSc, (most of them occur in the linesearch function). When the code returns from the line search routine it "knows" the function evaluation has already been done (at the newest solution) so it does not need to do it again. The line search does not perform the Newton step, it takes the step computed by the linear solve and decides how to use that step (you will see that KSPSolve time is not part of the line search time). Barry > On Feb 8, 2023, at 7:56 AM, Matteo Semplice wrote: > > Dear all, > > I am trying to optimize the nonlinear solvers in a code of mine, but I am having a hard time at interpreting the profiling data from the SNES. In particular, if I run with -snesCorr_snes_lag_jacobian 5 -snesCorr_snes_linesearch_monitor -snesCorr_snes_monitor -snesCorr_snes_linesearch_type basic -snesCorr_snes_view I get, for all timesteps an output like > > 0 SNES Function norm 2.204257292307e+00 > 1 SNES Function norm 5.156376709750e-03 > 2 SNES Function norm 9.399026338316e-05 > 3 SNES Function norm 1.700505246874e-06 > 4 SNES Function norm 2.938127043559e-08 > SNES Object: snesCorr (snesCorr_) 1 MPI process > type: newtonls > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=4 > total number of function evaluations=5 > norm schedule ALWAYS > Jacobian is rebuilt every 5 SNES iterations > SNESLineSearch Object: (snesCorr_) 1 MPI process > type: basic > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=40 > KSP Object: (snesCorr_) 1 MPI process > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (snesCorr_) 1 MPI process > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: (snesCorr_) 1 MPI process > type: seqaij > rows=1200, cols=1200 > package used to perform factorization: petsc > total: nonzeros=17946, allocated nonzeros=17946 > using I-node routines: found 400 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: seqaij > rows=1200, cols=1200 > total: nonzeros=17946, allocated nonzeros=17946 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 400 nodes, limit used is 5 > > I guess that this means that no linesearch is performed and the full Newton step is always performed (I did not report the full output, but all timesteps are alike). Also, with the default (bt) LineSearch, the total CPU time does not change, which seems in line with this. > > However, I'd have expected that the time spent in SNESLineSearch would be negligible, but the flamegraph is showing that about 38% of the time spent by SNESSolve is actually spent in SNESLineSearch. Furthermore, SNESLineSearch seems to cause more SNESFunction evaluations (in terms of CPU time) than the SNESSolve itself. The flamegraph is attached. > > Could some expert help me in understanding these data? Is the LineSearch actually performing the newton step? Given that the full step is always taken, can the SNESFunction evaluations from the LineSearch be skipped? > > Thanks a lot! > > Matteo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From narnoldm at umich.edu Wed Feb 8 18:50:24 2023 From: narnoldm at umich.edu (Nicholas Arnold-Medabalimi) Date: Wed, 8 Feb 2023 19:50:24 -0500 Subject: [petsc-users] DMPlex Reordering In-Reply-To: References: Message-ID: Hi Matt Could you clarify what DMPlexOrient is doing as far as the metric for compatibility for orientation? Sincerely Nicholas On Mon, Jul 25, 2022 at 11:54 AM Matthew Knepley wrote: > On Mon, Jul 25, 2022 at 10:11 AM Nicholas Arnold-Medabalimi < > narnoldm at umich.edu> wrote: > >> Hi Petsc users, >> >> I have been working on how to read in meshes into a DMPlex object. The >> process of building the cones has been relatively straightforward. The mesh >> files in question have sets of faces that I use to derive the cell vertex >> cones. The method is basically identical to that used in >> DMPlexCreateFluent. After I setup the DMPlex cones and call Symmetrize and >> Stratify, I then load in all the coordinates corresponding to the >> vertices and then use DMInterpolate to generate the intermediate edges >> >> The issue that I am running into is that because I am deriving the >> cell-vertex relationship from independent sets of face-vertex >> relationships, I can end up with cells that have improper mesh ordering. >> >> For example a cell with the coordinates: >> point 0: 0.000000 0.000000 >> point 1: 0.000000 2.500000 >> point 2: 0.100000 0.000000 >> point 3: 0.100000 2.500000 >> >> As you can see instead of going around the perimeter of the cell the path >> from 1 to 2 instead bisects the cell. >> >> I can manually reorder these after I read in the coordinates by manually >> checking right-handedness but I was wondering if there is an easier way to >> reorder the cones? If there isn't once I do reorder the cones manually is >> there anything I need to do as far as station keeping on the DM? >> > > The function DMPlexOrient() reorders the cones so that all cells have > compatible orientation. However, it will not catch this because it is > an illegal ordering for a quad. This order is called > a DM_POLYTOPE_SEG_PRISM_TENSOR in Plex because it is the tensor product of > two segments (so that opposite sides have the same orientation). If all > your cells are this way, you can just mark them as tensor segments > and you are done. If only some turn out this way, then we have to write > some code to recognize them, or flip them when you create the cones. > > Thanks, > > Matt > > >> I apologize if I missed any resources on this. >> >> Thanks >> Nicholas >> >> -- >> Nicholas Arnold-Medabalimi >> >> Ph.D. Candidate >> Computational Aeroscience Lab >> University of Michigan >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Nicholas Arnold-Medabalimi Ph.D. Candidate Computational Aeroscience Lab University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 8 19:26:42 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 8 Feb 2023 20:26:42 -0500 Subject: [petsc-users] DMPlex Reordering In-Reply-To: References: Message-ID: On Wed, Feb 8, 2023 at 7:50 PM Nicholas Arnold-Medabalimi < narnoldm at umich.edu> wrote: > Hi Matt > > Could you clarify what DMPlexOrient is doing as far as the metric for > compatibility for orientation? > It makes sure that the orientation of k-faces agree across processes (through the SF). If faces are created in parallel, without communication, there is no guarantee that both sides choose the same orientation. It also orients the serial edges, so you can throw in an un-oriented mesh and it will fix it. Thanks, Matt > Sincerely > Nicholas > > On Mon, Jul 25, 2022 at 11:54 AM Matthew Knepley > wrote: > >> On Mon, Jul 25, 2022 at 10:11 AM Nicholas Arnold-Medabalimi < >> narnoldm at umich.edu> wrote: >> >>> Hi Petsc users, >>> >>> I have been working on how to read in meshes into a DMPlex object. The >>> process of building the cones has been relatively straightforward. The mesh >>> files in question have sets of faces that I use to derive the cell vertex >>> cones. The method is basically identical to that used in >>> DMPlexCreateFluent. After I setup the DMPlex cones and call Symmetrize and >>> Stratify, I then load in all the coordinates corresponding to the >>> vertices and then use DMInterpolate to generate the intermediate edges >>> >>> The issue that I am running into is that because I am deriving the >>> cell-vertex relationship from independent sets of face-vertex >>> relationships, I can end up with cells that have improper mesh ordering. >>> >>> For example a cell with the coordinates: >>> point 0: 0.000000 0.000000 >>> point 1: 0.000000 2.500000 >>> point 2: 0.100000 0.000000 >>> point 3: 0.100000 2.500000 >>> >>> As you can see instead of going around the perimeter of the cell the >>> path from 1 to 2 instead bisects the cell. >>> >>> I can manually reorder these after I read in the coordinates by manually >>> checking right-handedness but I was wondering if there is an easier way to >>> reorder the cones? If there isn't once I do reorder the cones manually is >>> there anything I need to do as far as station keeping on the DM? >>> >> >> The function DMPlexOrient() reorders the cones so that all cells have >> compatible orientation. However, it will not catch this because it is >> an illegal ordering for a quad. This order is called >> a DM_POLYTOPE_SEG_PRISM_TENSOR in Plex because it is the tensor product of >> two segments (so that opposite sides have the same orientation). If all >> your cells are this way, you can just mark them as tensor segments >> and you are done. If only some turn out this way, then we have to write >> some code to recognize them, or flip them when you create the cones. >> >> Thanks, >> >> Matt >> >> >>> I apologize if I missed any resources on this. >>> >>> Thanks >>> Nicholas >>> >>> -- >>> Nicholas Arnold-Medabalimi >>> >>> Ph.D. Candidate >>> Computational Aeroscience Lab >>> University of Michigan >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Nicholas Arnold-Medabalimi > > Ph.D. Candidate > Computational Aeroscience Lab > University of Michigan > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Thu Feb 9 02:38:21 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Thu, 9 Feb 2023 16:38:21 +0800 Subject: [petsc-users] Save images of ksp_monitor and ksp_view_eigenvaluses with user defined names In-Reply-To: <3969E573-F5BA-4E90-863A-E349036BF06B@petsc.dev> References: <3969E573-F5BA-4E90-863A-E349036BF06B@petsc.dev> Message-ID: Hi, Barry Thanks for the tip. One more question: how can I save the log (draw_lg) figure by using `draw:image:joe.ppm`? Thanks. Zongze Barry Smith ?2023?2?9??? 05:35??? > > It cannot be done using the default X windows monitor and -draw_save > because there is no way to distinguish the files for each sub window images. > > However, there is an alternative. > > -ksp_view_eigenvalues draw:image:jeff.ppm -viewer_view > -ksp_monitor_solution draw:image:joe.ppm > > This alternative only supports .ppm files (so you may need to call a > converter on the result) and does put each image in a separate file in its > own named directory, for example, joe/joe_0.ppm but at least it allows you > to have different named files. Of course you can also just run your code > twice with two different options. > > Unfortunately there is a bug in the KSP eigenmonitor viewing that I had > to fix to get this to work so you'll need to checkout the > *barry/2023-02-08/fix-ksp-monitor-eigenvalues-draw *branch of PETSc to > use the option I suggest. > > Barry > > > > > On Feb 8, 2023, at 5:09 AM, Zongze Yang wrote: > > Hi, PETSc group, > > I was trying to save figures of the residual and eigenvalues with > different names but not default names. > > The default name is used when I use `-draw_save .png`. All images are > saved. > ``` > python test.py -N 16 -test1_ksp_type gmres -test1_pc_type jacobi > -test1_ksp_view_eigenvalues draw -test1_ksp_monitor draw::draw_lg > -draw_save .png > ``` > But when I use `-draw_save abc.png`, only the figure of eigenvalues is > saved. > ``` > python test.py -N 16 -test1_ksp_type gmres -test1_pc_type jacobi > -test1_ksp_view_eigenvalues draw -test1_ksp_monitor draw::draw_lg > -draw_save .png > ``` > > How can I add the command line options, to specify different names for > those images? > > Thanks, > Zongze > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Thu Feb 9 03:54:29 2023 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Thu, 9 Feb 2023 10:54:29 +0100 Subject: [petsc-users] interpreting data from SNESSolve profiling In-Reply-To: <2ECB7E17-13E6-4DFF-90A2-AAF1FEF71D46@petsc.dev> References: <5de98255-28b4-c768-c5d0-2595ead76a5c@uninsubria.it> <2ECB7E17-13E6-4DFF-90A2-AAF1FEF71D46@petsc.dev> Message-ID: <1eca32b1-abae-fa25-b94a-2813fe6f3f0c@uninsubria.it> I see: only the first function evaluation of each Newton solve appears by itself in the flamegraph and all the subsequent ones appear inside the linesearch routine. It's all clear now. Thanks ??? Matteo On 08/02/2023 23:00, Barry Smith wrote: > > ? ? ?Yes, the "basic" line search updates the solution vector and > evaluates the function at the new solution point; not much else. > > ? ? ?There are no duplicate or extra function evaluations in PETSc, > (most of them occur in the linesearch function). When the code returns > from the line search routine it "knows" the function evaluation has > already been done (at the newest solution) so it does not need to do > it again. > > ? ? ?The line search does not perform the Newton step, it takes the > step computed by the linear solve and decides how to use that step > (you will see that KSPSolve time is not part of the line search time). > > ? ?Barry > > >> On Feb 8, 2023, at 7:56 AM, Matteo Semplice >> wrote: >> >> Dear all, >> >> ??? I am trying to optimize the nonlinear solvers in a code of mine, >> but I am having a hard time at interpreting the profiling data from >> the SNES. In particular, if I run with -snesCorr_snes_lag_jacobian 5 >> -snesCorr_snes_linesearch_monitor -snesCorr_snes_monitor >> -snesCorr_snes_linesearch_type basic -snesCorr_snes_view I get, for >> all timesteps an output like >> >> 0 SNES Function norm 2.204257292307e+00 >> ?1 SNES Function norm 5.156376709750e-03 >> ?2 SNES Function norm 9.399026338316e-05 >> ?3 SNES Function norm 1.700505246874e-06 >> ?4 SNES Function norm 2.938127043559e-08 >> SNES Object: snesCorr (snesCorr_) 1 MPI process >> ?type: newtonls >> ?maximum iterations=50, maximum function evaluations=10000 >> ?tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 >> ?total number of linear solver iterations=4 >> ?total number of function evaluations=5 >> ?norm schedule ALWAYS >> ?Jacobian is rebuilt every 5 SNES iterations >> ?SNESLineSearch Object: (snesCorr_) 1 MPI process >> ???type: basic >> ???maxstep=1.000000e+08, minlambda=1.000000e-12 >> ???tolerances: relative=1.000000e-08, absolute=1.000000e-15, >> lambda=1.000000e-08 >> ???maximum iterations=40 >> ?KSP Object: (snesCorr_) 1 MPI process >> ???type: gmres >> ?????restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> ?????happy breakdown tolerance 1e-30 >> ???maximum iterations=10000, initial guess is zero >> ???tolerances: ?relative=1e-05, absolute=1e-50, divergence=10000. >> ???left preconditioning >> ???using PRECONDITIONED norm type for convergence test >> ?PC Object: (snesCorr_) 1 MPI process >> ???type: ilu >> ?????out-of-place factorization >> ?????0 levels of fill >> ?????tolerance for zero pivot 2.22045e-14 >> ?????matrix ordering: natural >> ?????factor fill ratio given 1., needed 1. >> ???????Factored matrix follows: >> ?????????Mat Object: (snesCorr_) 1 MPI process >> ???????????type: seqaij >> ???????????rows=1200, cols=1200 >> ???????????package used to perform factorization: petsc >> ???????????total: nonzeros=17946, allocated nonzeros=17946 >> ?????????????using I-node routines: found 400 nodes, limit used is 5 >> ???linear system matrix = precond matrix: >> ???Mat Object: 1 MPI process >> ?????type: seqaij >> ?????rows=1200, cols=1200 >> ?????total: nonzeros=17946, allocated nonzeros=17946 >> ?????total number of mallocs used during MatSetValues calls=0 >> ???????using I-node routines: found 400 nodes, limit used is 5 >> >> I guess that this means that no linesearch is performed and the full >> Newton step is always performed (I did not report the full output, >> but all timesteps are alike). Also, with the default (bt) LineSearch, >> the total CPU time does not change, which seems in line with this. >> >> However, I'd have expected that the time spent in SNESLineSearch >> would be negligible, but the flamegraph is showing that about 38% of >> the time spent by SNESSolve is actually spent in SNESLineSearch. >> Furthermore, SNESLineSearch seems to cause more SNESFunction >> evaluations (in terms of CPU time) than the SNESSolve itself. The >> flamegraph is attached. >> >> Could some expert help me in understanding these data? Is the >> LineSearch actually performing the newton step? Given that the full >> step is always taken, can the SNESFunction evaluations from the >> LineSearch be skipped? >> >> Thanks a lot! >> >> Matteo >> >> > -- Prof. Matteo Semplice Universit? degli Studi dell?Insubria Dipartimento di Scienza e Alta Tecnologia ? DiSAT Professore Associato Via Valleggio, 11 ? 22100 Como (CO) ? Italia tel.: +39 031 2386316 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Feb 9 09:07:31 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 9 Feb 2023 10:07:31 -0500 Subject: [petsc-users] Save images of ksp_monitor and ksp_view_eigenvaluses with user defined names In-Reply-To: References: <3969E573-F5BA-4E90-863A-E349036BF06B@petsc.dev> Message-ID: > On Feb 9, 2023, at 3:38 AM, Zongze Yang wrote: > > Hi, Barry > > Thanks for the tip. > > One more question: how can I save the log (draw_lg) figure by using `draw:image:joe.ppm`? I am not sure what you mean. >> -ksp_monitor_solution draw:image:joe.ppm Will create a directory called joe with a set of files in it. Each file contains one solution during the iterative process. Barry > > Thanks. > Zongze > > Barry Smith > ?2023?2?9??? 05:35??? >> >> It cannot be done using the default X windows monitor and -draw_save because there is no way to distinguish the files for each sub window images. >> >> However, there is an alternative. >> >> -ksp_view_eigenvalues draw:image:jeff.ppm -viewer_view -ksp_monitor_solution draw:image:joe.ppm >> >> This alternative only supports .ppm files (so you may need to call a converter on the result) and does put each image in a separate file in its own named directory, for example, joe/joe_0.ppm but at least it allows you to have different named files. Of course you can also just run your code twice with two different options. >> >> Unfortunately there is a bug in the KSP eigenmonitor viewing that I had to fix to get this to work so you'll need to checkout the barry/2023-02-08/fix-ksp-monitor-eigenvalues-draw branch of PETSc to use the option I suggest. >> >> Barry >> >> >> >> >>> On Feb 8, 2023, at 5:09 AM, Zongze Yang > wrote: >>> >>> Hi, PETSc group, >>> >>> I was trying to save figures of the residual and eigenvalues with different names but not default names. >>> >>> The default name is used when I use `-draw_save .png`. All images are saved. >>> ``` >>> python test.py -N 16 -test1_ksp_type gmres -test1_pc_type jacobi -test1_ksp_view_eigenvalues draw -test1_ksp_monitor draw::draw_lg -draw_save .png >>> ``` >>> But when I use `-draw_save abc.png`, only the figure of eigenvalues is saved. >>> ``` >>> python test.py -N 16 -test1_ksp_type gmres -test1_pc_type jacobi -test1_ksp_view_eigenvalues draw -test1_ksp_monitor draw::draw_lg -draw_save .png >>> ``` >>> >>> How can I add the command line options, to specify different names for those images? >>> >>> Thanks, >>> Zongze >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Thu Feb 9 11:42:02 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Thu, 9 Feb 2023 17:42:02 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode Message-ID: Hi PETSc-developers, In our application we call KSP_Solve as part of a step to propagate a beam through a lattice. I am observing a crash within KSP_Solve for an application only after the 43rd call to KSP_Solve when building the application and PETSc in debug mode, full logs for which are attached with this email (1 MPI rank and 4 OMP threads were used, but this crash occurs with multiple MPI ranks as well ). I am also including the last few lines of the configuration for this build. This crash does not occur when building the application and PETSc in release mode. Could someone tell me what causes this crash and if anything can be done to prevent it? Thanks in advance. The configuration of this solver is here : https://github.com/fnalacceleratormodeling/synergia2/blob/sajid/features/openpmd_basic_integration/src/synergia/collective/space_charge_3d_fd_utils.cc#L273-L292 Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ksp_crash_log Type: application/octet-stream Size: 3505 bytes Desc: ksp_crash_log URL: From sasyed at fnal.gov Thu Feb 9 11:47:08 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Thu, 9 Feb 2023 17:47:08 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: Message-ID: The configuration log is attached with this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure_log_tail Type: application/octet-stream Size: 6347 bytes Desc: configure_log_tail URL: From bsmith at petsc.dev Thu Feb 9 12:02:28 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 9 Feb 2023 13:02:28 -0500 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: Message-ID: Hmm, looks like your build may be funny? It is not in debug mode frame #2: 0x000000010eda20c8 libpetsc.3.018.dylib`PetscHeaderDestroy_Private + 1436 frame #3: 0x000000010f10176c libpetsc.3.018.dylib`VecDestroy + 808 frame #4: 0x0000000110199f34 libpetsc.3.018.dylib`KSPSolve_Private + 512 In debugger mode it would show the line numbers where the crash occurred and help us determine the problem. I do note the -g being used by the compilers so cannot explain off hand why it does not display the debug information. Barry > On Feb 9, 2023, at 12:42 PM, Sajid Ali Syed via petsc-users wrote: > > Hi PETSc-developers, > > In our application we call KSP_Solve as part of a step to propagate a beam through a lattice. I am observing a crash within KSP_Solve for an application only after the 43rd call to KSP_Solve when building the application and PETSc in debug mode, full logs for which are attached with this email (1 MPI rank and 4 OMP threads were used, but this crash occurs with multiple MPI ranks as well ). I am also including the last few lines of the configuration for this build. This crash does not occur when building the application and PETSc in release mode. > > Could someone tell me what causes this crash and if anything can be done to prevent it? Thanks in advance. > > The configuration of this solver is here : > https://github.com/fnalacceleratormodeling/synergia2/blob/sajid/features/openpmd_basic_integration/src/synergia/collective/space_charge_3d_fd_utils.cc#L273-L292 > > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Thu Feb 9 12:56:30 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Thu, 9 Feb 2023 18:56:30 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: Message-ID: Hi Barry, The lack of line numbers is due to the fact that this build of PETSc was done via spack which installs it in a temporary directory before moving it to the final location. I have removed that build and installed PETSc locally (albeit with a simpler configuration) and see the same bug. Logs for this configuration and the error trace with this build are attached with this email. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Barry Smith Sent: Thursday, February 9, 2023 12:02 PM To: Sajid Ali Syed Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode Hmm, looks like your build may be funny? It is not in debug mode frame #2: 0x000000010eda20c8 libpetsc.3.018.dylib`PetscHeaderDestroy_Private + 1436 frame #3: 0x000000010f10176c libpetsc.3.018.dylib`VecDestroy + 808 frame #4: 0x0000000110199f34 libpetsc.3.018.dylib`KSPSolve_Private + 512 In debugger mode it would show the line numbers where the crash occurred and help us determine the problem. I do note the -g being used by the compilers so cannot explain off hand why it does not display the debug information. Barry On Feb 9, 2023, at 12:42 PM, Sajid Ali Syed via petsc-users wrote: Hi PETSc-developers, In our application we call KSP_Solve as part of a step to propagate a beam through a lattice. I am observing a crash within KSP_Solve for an application only after the 43rd call to KSP_Solve when building the application and PETSc in debug mode, full logs for which are attached with this email (1 MPI rank and 4 OMP threads were used, but this crash occurs with multiple MPI ranks as well ). I am also including the last few lines of the configuration for this build. This crash does not occur when building the application and PETSc in release mode. Could someone tell me what causes this crash and if anything can be done to prevent it? Thanks in advance. The configuration of this solver is here : https://github.com/fnalacceleratormodeling/synergia2/blob/sajid/features/openpmd_basic_integration/src/synergia/collective/space_charge_3d_fd_utils.cc#L273-L292 Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure_log_tail_local_install Type: application/octet-stream Size: 2781 bytes Desc: configure_log_tail_local_install URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ksp_crash_log_local_install Type: application/octet-stream Size: 5179 bytes Desc: ksp_crash_log_local_install URL: From sasyed at fnal.gov Thu Feb 9 13:16:47 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Thu, 9 Feb 2023 19:16:47 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: Message-ID: I?ve also printed out the head struct in the debugger, and it looks like this: (lldb) print (TRSPACE)*head(TRSPACE) $7 = { size = 16 rsize = 16 id = 12063 lineno = 217 filename = 0x00000001167fd865 "/Users/sasyed/Documents/packages/petsc/src/sys/dll/reg.c" functionname = 0x00000001167fde78 "PetscFunctionListDLAllPop_Private" classid = -253701943 stack = { function = { [0] = 0x000000010189e2da "apply_bunch" [1] = 0x000000010189e2da "apply_bunch" [2] = 0x000000010189e2da "apply_bunch" [3] = 0x000000010189e2da "apply_bunch" [4] = 0x000000010189e2da "apply_bunch" [5] = 0x000000010189e2da "apply_bunch" [6] = 0x000000010189e2da "apply_bunch" [7] = 0x000000010189e2da "apply_bunch" [8] = 0x000000010189e2da "apply_bunch" [9] = 0x000000010189e2da "apply_bunch" [10] = 0x000000010189e2da "apply_bunch" [11] = 0x000000010189e2da "apply_bunch" [12] = 0x000000010189e2da "apply_bunch" [13] = 0x000000010189e2da "apply_bunch" [14] = 0x000000010189e2da "apply_bunch" [15] = 0x000000010189e2da "apply_bunch" [16] = 0x000000010189e2da "apply_bunch" [17] = 0x000000010189e2da "apply_bunch" [18] = 0x000000010189e2da "apply_bunch" [19] = 0x000000010189e2da "apply_bunch" [20] = 0x000000010189e2da "apply_bunch" [21] = 0x000000010189e2da "apply_bunch" [22] = 0x000000010189e2da "apply_bunch" [23] = 0x000000010189e2da "apply_bunch" [24] = 0x000000010189e2da "apply_bunch" [25] = 0x000000010189e2da "apply_bunch" [26] = 0x000000010189e2da "apply_bunch" [27] = 0x000000010189e2da "apply_bunch" [28] = 0x000000010189e2da "apply_bunch" [29] = 0x000000010189e2da "apply_bunch" [30] = 0x000000010189e2da "apply_bunch" [31] = 0x000000010189e2da "apply_bunch" [32] = 0x000000010189e2da "apply_bunch" [33] = 0x000000010189e2da "apply_bunch" [34] = 0x000000010189e2da "apply_bunch" [35] = 0x000000010189e2da "apply_bunch" [36] = 0x000000010189e2da "apply_bunch" [37] = 0x000000010189e2da "apply_bunch" [38] = 0x000000010189e2da "apply_bunch" [39] = 0x000000010189e2da "apply_bunch" [40] = 0x000000010189e2da "apply_bunch" [41] = 0x000000010189e2da "apply_bunch" [42] = 0x000000010189e2da "apply_bunch" [43] = 0x000000010189e2da "apply_bunch" [44] = 0x000000010189e2da "apply_bunch" [45] = 0x000000010189e2da "apply_bunch" [46] = 0x000000010189ebba "compute_mat" [47] = 0x000000010189f0c3 "solve" [48] = 0x00000001168b834c "KSPSolve" [49] = 0x00000001168b89f7 "KSPSolve_Private" [50] = 0x00000001168b395b "KSPSolve_GMRES" [51] = 0x00000001168b37f8 "KSPGMRESCycle" [52] = 0x00000001168ae4a7 "KSP_PCApplyBAorAB" [53] = 0x0000000116891b38 "PCApplyBAorAB" [54] = 0x00000001168917ec "PCApply" [55] = 0x00000001168a5337 "PCApply_MG" [56] = 0x00000001168a5342 "PCApply_MG_Internal" [57] = 0x00000001168a42e1 "PCMGMCycle_Private" [58] = 0x00000001168b834c "KSPSolve" [59] = 0x00000001168b89f7 "KSPSolve_Private" [60] = 0x000000011682e396 "VecDestroy" [61] = 0x000000011682d58e "VecDestroy_Seq" [62] = 0x00000001168093fe "PetscObjectComposeFunction_Private" [63] = 0x0000000116809338 "PetscObjectComposeFunction_Petsc" } file = { [0] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [1] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [2] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [3] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [4] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [5] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [6] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [7] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [8] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [9] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [10] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [11] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [12] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [13] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [14] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [15] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [16] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [17] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [18] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [19] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [20] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [21] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [22] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [23] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [24] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [25] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [26] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [27] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [28] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [29] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [30] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [31] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [32] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [33] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [34] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [35] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [36] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [37] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [38] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [39] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [40] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [41] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [42] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [43] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [44] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [45] = 0x000000010189e27f "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd.cc" [46] = 0x000000010189e926 "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd_utils.cc" [47] = 0x000000010189e926 "/Users/sasyed/Documents/packages/synergia2/src/synergia/collective/space_charge_3d_fd_utils.cc" [48] = 0x00000001168b7d65 "/Users/sasyed/Documents/packages/petsc/src/ksp/ksp/interface/itfunc.c" [49] = 0x00000001168b7d65 "/Users/sasyed/Documents/packages/petsc/src/ksp/ksp/interface/itfunc.c" [50] = 0x00000001168b37b1 "/Users/sasyed/Documents/packages/petsc/src/ksp/ksp/impls/gmres/gmres.c" [51] = 0x00000001168b37b1 "/Users/sasyed/Documents/packages/petsc/src/ksp/ksp/impls/gmres/gmres.c" [52] = 0x000000011689fcb9 "/Users/sasyed/Documents/packages/petsc/include/petsc/private/kspimpl.h" [53] = 0x00000001168915b0 "/Users/sasyed/Documents/packages/petsc/src/ksp/pc/interface/precon.c" [54] = 0x00000001168915b0 "/Users/sasyed/Documents/packages/petsc/src/ksp/pc/interface/precon.c" [55] = 0x00000001168a42f4 "/Users/sasyed/Documents/packages/petsc/src/ksp/pc/impls/mg/mg.c" [56] = 0x00000001168a42f4 "/Users/sasyed/Documents/packages/petsc/src/ksp/pc/impls/mg/mg.c" [57] = 0x00000001168a42f4 "/Users/sasyed/Documents/packages/petsc/src/ksp/pc/impls/mg/mg.c" [58] = 0x00000001168b7d65 "/Users/sasyed/Documents/packages/petsc/src/ksp/ksp/interface/itfunc.c" [59] = 0x00000001168b7d65 "/Users/sasyed/Documents/packages/petsc/src/ksp/ksp/interface/itfunc.c" [60] = 0x000000011682e091 "/Users/sasyed/Documents/packages/petsc/src/vec/vec/interface/vector.c" [61] = 0x000000011682d339 "/Users/sasyed/Documents/packages/petsc/src/vec/vec/impls/seq/bvec2.c" [62] = 0x0000000116808dea "/Users/sasyed/Documents/packages/petsc/src/sys/objects/inherit.c" [63] = 0x0000000116808dea "/Users/sasyed/Documents/packages/petsc/src/sys/objects/inherit.c" } line = { [0] = 198 [1] = 198 [2] = 198 [3] = 198 [4] = 198 [5] = 198 [6] = 198 [7] = 198 [8] = 198 [9] = 198 [10] = 198 [11] = 198 [12] = 198 [13] = 198 [14] = 198 [15] = 198 [16] = 198 [17] = 198 [18] = 198 [19] = 198 [20] = 198 [21] = 198 [22] = 198 [23] = 198 [24] = 198 [25] = 198 [26] = 198 [27] = 198 [28] = 198 [29] = 198 [30] = 198 [31] = 198 [32] = 198 [33] = 198 [34] = 198 [35] = 198 [36] = 198 [37] = 198 [38] = 198 [39] = 198 [40] = 198 [41] = 198 [42] = 198 [43] = 198 [44] = 198 [45] = 198 [46] = 326 [47] = 413 [48] = 1070 [49] = 898 [50] = 228 [51] = 147 [52] = 416 [53] = 715 [54] = 441 [55] = 633 [56] = 611 [57] = 28 [58] = 1070 [59] = 811 [60] = 528 [61] = 734 [62] = 815 [63] = 691 } petscroutine = { [0] = 2 [1] = 2 [2] = 2 [3] = 191 [4] = 2 [5] = 2 [6] = 2 [7] = 2 [8] = 2 [9] = 2 [10] = 2 [11] = 2 [12] = 2 [13] = 2 [14] = 2 [15] = 2 [16] = 2 [17] = 2 [18] = 2 [19] = 2 [20] = 2 [21] = 2 [22] = 2 [23] = 2 [24] = 2 [25] = 2 [26] = 2 [27] = 2 [28] = 2 [29] = 2 [30] = 2 [31] = 2 [32] = 2 [33] = 2 [34] = 2 [35] = 2 [36] = 2 [37] = 2 [38] = 2 [39] = 2 [40] = 2 [41] = 2 [42] = 2 [43] = 2 [44] = 2 [45] = 2 [46] = 2 [47] = 2 [48] = 1 [49] = 1 [50] = 1 [51] = 1 [52] = 1 [53] = 1 [54] = 1 [55] = 1 [56] = 1 [57] = 1 [58] = 1 [59] = 1 [60] = 1 [61] = 1 [62] = 1 [63] = 1 } currentsize = 69 hotdepth = 0 check = PETSC_TRUE } next = 0x0000000100000000 prev = NULL } (lldb) ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Feb 9 13:30:52 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 9 Feb 2023 14:30:52 -0500 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: Message-ID: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> Hmm, it is crashing in PetscFunctionListDLAllPop_Private() which is very recent code with Jacob's name on it. It is only run in debug mode, hence only crashes there. I don't know why it fails here, somehow memory was corrupted earlier? Try running with -malloc_debug or under valgrind? static PetscErrorCode PetscFunctionListDLAllPop_Private(PetscFunctionList fl) { PetscFunctionBegin; if (PetscDefined(USE_DEBUG) && !PetscDefined(HAVE_THREADSAFETY)) { PetscFunctionListDLAll current = dlallhead, prev = NULL; /* Remove this entry from the main DL list (if it is in it) */ while (current) { const PetscFunctionListDLAll next = current->next; if (current->data == fl) { if (prev) { // somewhere in the middle (or end) of the list prev->next = next; } else { // prev = NULL implies current = dlallhead, so front of list dlallhead = next; } PetscCall(PetscFree(current)); break; } prev = current; current = next; } } PetscFunctionReturn(PETSC_SUCCESS); } > On Feb 9, 2023, at 1:56 PM, Sajid Ali Syed wrote: > > Hi Barry, > > The lack of line numbers is due to the fact that this build of PETSc was done via spack which installs it in a temporary directory before moving it to the final location. > > I have removed that build and installed PETSc locally (albeit with a simpler configuration) and see the same bug. Logs for this configuration and the error trace with this build are attached with this email. > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > From: Barry Smith > > Sent: Thursday, February 9, 2023 12:02 PM > To: Sajid Ali Syed > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode > > > Hmm, looks like your build may be funny? It is not in debug mode > > frame #2: 0x000000010eda20c8 libpetsc.3.018.dylib`PetscHeaderDestroy_Private + 1436 > frame #3: 0x000000010f10176c libpetsc.3.018.dylib`VecDestroy + 808 > frame #4: 0x0000000110199f34 libpetsc.3.018.dylib`KSPSolve_Private + 512 > > In debugger mode it would show the line numbers where the crash occurred and help us determine the problem. I do note the -g being used by the compilers so cannot explain off hand why it does not display the debug information. > > Barry > > >> On Feb 9, 2023, at 12:42 PM, Sajid Ali Syed via petsc-users > wrote: >> >> Hi PETSc-developers, >> >> In our application we call KSP_Solve as part of a step to propagate a beam through a lattice. I am observing a crash within KSP_Solve for an application only after the 43rd call to KSP_Solve when building the application and PETSc in debug mode, full logs for which are attached with this email (1 MPI rank and 4 OMP threads were used, but this crash occurs with multiple MPI ranks as well ). I am also including the last few lines of the configuration for this build. This crash does not occur when building the application and PETSc in release mode. >> >> Could someone tell me what causes this crash and if anything can be done to prevent it? Thanks in advance. >> >> The configuration of this solver is here : >> >> https://github.com/fnalacceleratormodeling/synergia2/blob/sajid/features/openpmd_basic_integration/src/synergia/collective/space_charge_3d_fd_utils.cc#L273-L292 >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Scientific Computing Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> > ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure_log_tail_local_install Type: application/octet-stream Size: 2781 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ksp_crash_log_local_install Type: application/octet-stream Size: 5179 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Thu Feb 9 17:03:48 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Thu, 9 Feb 2023 23:03:48 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> Message-ID: I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` 599 ?????@*/ 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) -> 601 ?????{ 602 ????? PetscMPIInt rank; 603 ????? 604 ????? PetscFunctionBegin; (lldb) frame info frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 (lldb) The trace seems to indicate some sort of infinite loop causing an overflow. PS: I'm using a arm64 mac, so I don't have access to valgrind. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lldb_ksp_crash_session Type: application/octet-stream Size: 6775342 bytes Desc: lldb_ksp_crash_session URL: From knepley at gmail.com Thu Feb 9 17:24:51 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 9 Feb 2023 18:24:51 -0500 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> Message-ID: On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users < petsc-users at mcs.anl.gov> wrote: > I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace > from lldb is in the attached file. The crash now seems to be at: > > Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) > frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 > 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` > 599 ?????@*/ > 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) > -> 601 ?????{ > 602 ????? PetscMPIInt rank; > 603 > 604 ????? PetscFunctionBegin; > (lldb) frame info > frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 > (lldb) > > The trace seems to indicate some sort of infinite loop causing an overflow. > Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to remove all error checking from the prints inside Validate. Thanks, Matt > PS: I'm using a arm64 mac, so I don't have access to valgrind. > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cesarillovlc at gmail.com Fri Feb 10 06:59:36 2023 From: cesarillovlc at gmail.com (Ces VLC) Date: Fri, 10 Feb 2023 13:59:36 +0100 Subject: [petsc-users] GPUs and the float-double dilemma Message-ID: Hi! I searched if it?s supported to link an application to two different builds of PETSc (one configured as float precision, and the other as double). The last post I found on that topic was from 2016 and it said it?s not recommended. The point is that if you wish to prepare builds of your application for end-users, and if your app offers the option of using GPUs, you have a critical problem if you cannot link with two different PETSc builds in the same executable: either you don?t provide support for most GPUs (as they are float only), or you force float precision even when using the CPU. A third option (shipping two executables for the app) is not practical, as the user won?t be able to compare results without quitting the app and running the other version. Has the situation changed since 2016, now that GPU support is being added to PETSc? An obvious solution would be if PETSc could be built prepending a prefix to all symbols (functions, structs, types, everything). Any advances, plans, or thoughts on this? Thanks! C?sar -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksl7912 at snu.ac.kr Fri Feb 10 01:10:49 2023 From: ksl7912 at snu.ac.kr (=?UTF-8?B?wq3qtozsirnrpqw=?=) Date: Fri, 10 Feb 2023 16:10:49 +0900 Subject: [petsc-users] Question about applying algebraic multigrid (AMG) PC and multilevel Schwarz PC Message-ID: Dear petsc developers. Hello. While I was applying the preconditioner in contact with the FEM code, I had some questions. How can I apply the AMG PC and the multilevel Schwarz PC in the FEM code? Purpose is to compare which PC is better in the FEM code with contact situation. In my case, the Jacobi PC could be easily applied as shown in the example below. However, AMG and multilevel have too many options. ex) ./app -ksp_type gmres -pc_type jacobi -ksp_view Best regards Seung Lee Kwon -- Seung Lee Kwon, Ph.D.Candidate Aerospace Structures and Materials Laboratory Department of Mechanical and Aerospace Engineering Seoul National University Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826 E-mail : ksl7912 at snu.ac.kr Office : +82-2-880-7389 C. P : +82-10-4695-1062 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Feb 10 08:35:36 2023 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Feb 2023 09:35:36 -0500 Subject: [petsc-users] Question about applying algebraic multigrid (AMG) PC and multilevel Schwarz PC In-Reply-To: References: Message-ID: On Fri, Feb 10, 2023 at 9:17 AM ???? wrote: > Dear petsc developers. > > Hello. > While I was applying the preconditioner in contact with the FEM code, I > had some questions. > > How can I apply the AMG PC and the multilevel Schwarz PC in the FEM code? > Purpose is to compare which PC is better in the FEM code with contact > situation. > > In my case, the Jacobi PC could be easily applied as shown in the example > below. However, AMG and multilevel have too many options. > > ex) ./app -ksp_type gmres -pc_type jacobi -ksp_view > You can use the built-in AMG PC with: -pc_type gamg Technically AMG is a multilevel Schwarz PC, but I think you may be referring to: -pc_type bddc All solvers are equipped with default parameters that tend to be optimized for robustness rather than peak performance and they are a good place to start. Thanks, Mark > Best regards > > Seung Lee Kwon > -- > Seung Lee Kwon, Ph.D.Candidate > Aerospace Structures and Materials Laboratory > Department of Mechanical and Aerospace Engineering > Seoul National University > Building 300 Rm 503, Gwanak-ro 1, Gwanak-gu, Seoul, South Korea, 08826 > E-mail : ksl7912 at snu.ac.kr > Office : +82-2-880-7389 > C. P : +82-10-4695-1062 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Feb 10 10:31:29 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 10 Feb 2023 10:31:29 -0600 Subject: [petsc-users] GPUs and the float-double dilemma In-Reply-To: References: Message-ID: On Fri, Feb 10, 2023 at 8:16 AM Ces VLC wrote: > Hi! > > I searched if it?s supported to link an application to two different > builds of PETSc (one configured as float precision, and the other as > double). The last post I found on that topic was from 2016 and it said it?s > not recommended. > > The point is that if you wish to prepare builds of your application for > end-users, and if your app offers the option of using GPUs, you have a > critical problem if you cannot link with two different PETSc builds in the > same executable: either you don?t provide support for most GPUs (as they > are float only), or you force float precision even when using the CPU. A > third option (shipping two executables for the app) is not practical, as > the user won?t be able to compare results without quitting the app and > running the other version. > Why do you say most GPUs are float only? I do not have a survey but the NVIDIA, AMD, Intel GPUs I have access to all support double :) > > Has the situation changed since 2016, now that GPU support is being added > to PETSc? > > An obvious solution would be if PETSc could be built prepending a prefix > to all symbols (functions, structs, types, everything). > Sounds like a bomb > > Any advances, plans, or thoughts on this? > Interfacing petsc with libraries (e.g.,Gingko) that support mixed-precision could be an approach. But we have not tried that yet. > > Thanks! > > C?sar > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 10 10:50:51 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 10 Feb 2023 11:50:51 -0500 Subject: [petsc-users] GPUs and the float-double dilemma In-Reply-To: References: Message-ID: On Fri, Feb 10, 2023 at 11:31 AM Junchao Zhang wrote: > > On Fri, Feb 10, 2023 at 8:16 AM Ces VLC wrote: > >> Hi! >> >> I searched if it?s supported to link an application to two different >> builds of PETSc (one configured as float precision, and the other as >> double). The last post I found on that topic was from 2016 and it said it?s >> not recommended. >> >> The point is that if you wish to prepare builds of your application for >> end-users, and if your app offers the option of using GPUs, you have a >> critical problem if you cannot link with two different PETSc builds in the >> same executable: either you don?t provide support for most GPUs (as they >> are float only), or you force float precision even when using the CPU. A >> third option (shipping two executables for the app) is not practical, as >> the user won?t be able to compare results without quitting the app and >> running the other version. >> > Why do you say most GPUs are float only? I do not have a survey but the > NVIDIA, AMD, Intel GPUs I have access to all support double :) > > >> >> Has the situation changed since 2016, now that GPU support is being added >> to PETSc? >> >> An obvious solution would be if PETSc could be built prepending a prefix >> to all symbols (functions, structs, types, everything). >> > Sounds like a bomb > >> >> Any advances, plans, or thoughts on this? >> > Interfacing petsc with libraries (e.g.,Gingko) that support > mixed-precision could be an approach. But we have not tried that yet. > The datatype on device need not match the datatype on the CPU. This is how I prefer to do things, running float on device and double on the CPU. This is possible now I think. Matt > >> Thanks! >> >> C?sar >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Fri Feb 10 14:34:05 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Fri, 10 Feb 2023 20:34:05 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> Message-ID: Thank you for your response. ?But the original big matrix with lots of extra reserved space still exists (your A).? - I would like to save memory on rows that are ?inactive? by making use of d_nnz and o_nnz. (Q1) When is a row considered ?inactive? and dropped by -pc_type redistribute during the solve? Is it when the entire row is zero? (Q2) If the answer to (Q1) is ?yes?. Am I right in thinking it would be beneficial to make have 0.0 on the diagonal of the ?inactive? rows instead of ?1.0?? Currently, I am setting 0.0 to the diagonal element of the ?inactive? rows. I am propose the following, please let me know if I am heading in the right direction. (Q3) ) I am planning to create two C++ vectors: d_nnz_vec and o_nnz_vec of N. Is d_nnz_vec[inactive_row_index] = 0 and o_nnz_vec[inactive_row_index] = 0 or d_nnz_vec[inactive_row_index] = 1 and Is o_nnz_vec[inactive_row_index] = 0? d_nnz_vec[active_row_index] = 10 and o_nnz_vec[active_row_index] = 10 (Q4) Is the below correct? PetscInt * d_nnz = NULL; PetscInt * o_nnz = NULL; PetscInt max_d_nz = 10; PetscInt max_o_nz = 10; MatSetType(A, MATAIJ); MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); PetscInt rstart, rend; MatGetOwnershipRange(A,&rstart,&rend); PetscInt local_size = rend ? rstart; PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); for (int i = 0; i < local_size; i++) { d_nnz[i] = d_nnz_vec[i + rstart]; o_nnz[i] = o_nnz_vec[i + rstart]; } MatMPIAIJSetPreallocation(A,max_d_nz,d_nnz,max_o_nz, o_nnz); PetscFree(d_nnz); PetscFree(o_nnz); Kind regards, Karthik. From: Barry Smith Date: Wednesday, 8 February 2023 at 21:10 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 8, 2023, at 11:19 AM, Karthikeyan Chockalingam - STFC UKRI wrote: No, I am not calling MatMPIAIJSetPreallocation(... N,NULL); Here is what I do: PetscInt d_nz = 10; PetscInt o_nz = 10; ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); ierr = MatMPIAIJSetPreallocation(A, d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); (Q1) As I am setting the size of A to be N x N via ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); d_nz and o_nz determine the memory reserved for A. So if you use for example 10 that means 10 entries are reserved for every row, even inactive ones. and pre-allocation is done for ALL rows I would like to understand if the ?inactive rows? are NOT contributing to memory (while using ?redistribute?)? Redistribute will drop all the inactive rows from the computation of the solve, it generates a smaller matrix missing all those rows and columns. But the original big matrix with lots of extra reserved space still exists (your A). (Q2) I tried solving using hypre within redistribute and system converges to a solution. Is below correct way to use hypre within redistribute? ierr = PetscOptionsSetValue(NULL,"-ksp_type", "preonly"); ierr = PetscOptionsSetValue(NULL,"-pc_type", "redistribute"); ierr = PetscOptionsSetValue(NULL,"-redistribute_ksp_type", "cg"); ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "hypre"); ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_hypre_type", "boomeramg"); Correct. You can run with -ksp_view and it will provide all the information about how the solves are layered. Many thanks, Karthik. From: Barry Smith > Date: Tuesday, 7 February 2023 at 19:52 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry for your detailed response. I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). The number of elements on each level is a geometric progression. N - Number elements on each level indexed by ?n? n - Adaptive mesh level index (starting from 1) a - Number of elements on the base mesh = 16 r = 4 (each element is divided into four on the next higher level of refinement) N = a r^(n-1) The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). u = u^(1) + u^(2) + ? + u^(7) Hence I have seven system matrix and solution vector pairs, one for each level. On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. (Q2) After adding 0.0 to all diagonal. I able to solve using either ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); or ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); I was able to solve using hypre as well. Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. No you do not need redistribute, but for large problems with many empty rows using a solver inside redistribute will be faster than just using that solver directly on the much larger (mostly empty) system. (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... N,NULL); where N is the number of columns in the matrix? If so, yes this causes a huge malloc() by PETSc when it allocates the matrix. It is not scalable. Do you have a small upper bound on the number of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect but much better than N. Barry as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? Kind regards, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 22:42 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: No problem. I don?t completely follow. (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. Best, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 20:18 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. Barry On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you very much for offering to debug. I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. I have attached three files: petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. problem.c ? A simple stand-alone petsc code, which produces almost the same error message. petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? Please use problem.c to debug the issue. Kind regards, Karthik. From: Barry Smith > Date: Saturday, 4 February 2023 at 00:22 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith > Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Feb 10 14:44:04 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 10 Feb 2023 15:44:04 -0500 Subject: [petsc-users] GPUs and the float-double dilemma In-Reply-To: References: Message-ID: <53CB96CE-0233-4124-BA6C-86AD88C910CF@petsc.dev> What is the use case you are looking for that cannot be achieved by just distributing a single precision application? If the user is happy when they happen to have GPUs to use single precision everywhere, then why would they need double precision if they happen not to have a GPU? Are you just using KSP or also SNES, TS etc? > On Feb 10, 2023, at 7:59 AM, Ces VLC wrote: > > Hi! > > I searched if it?s supported to link an application to two different builds of PETSc (one configured as float precision, and the other as double). The last post I found on that topic was from 2016 and it said it?s not recommended. > > The point is that if you wish to prepare builds of your application for end-users, and if your app offers the option of using GPUs, you have a critical problem if you cannot link with two different PETSc builds in the same executable: either you don?t provide support for most GPUs (as they are float only), or you force float precision even when using the CPU. A third option (shipping two executables for the app) is not practical, as the user won?t be able to compare results without quitting the app and running the other version. > > Has the situation changed since 2016, now that GPU support is being added to PETSc? > > An obvious solution would be if PETSc could be built prepending a prefix to all symbols (functions, structs, types, everything). > > Any advances, plans, or thoughts on this? > > Thanks! > > C?sar > From bsmith at petsc.dev Fri Feb 10 14:53:06 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 10 Feb 2023 15:53:06 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> Message-ID: > On Feb 10, 2023, at 3:34 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Thank you for your response. > > ?But the original big matrix with lots of extra reserved space still exists (your A).? - I would like to save memory on rows that are ?inactive? by making use of d_nnz and o_nnz. > > (Q1) When is a row considered ?inactive? and dropped by -pc_type redistribute during the solve? Is it when the entire row is zero? When there is only a diagonal entry or no entries. > > (Q2) If the answer to (Q1) is ?yes?. Am I right in thinking it would be beneficial to make have 0.0 on the diagonal of the ?inactive? rows instead of ?1.0?? Doesn't matter, you can have any value on the diagonal. > > Currently, I am setting 0.0 to the diagonal element of the ?inactive? rows. > I am propose the following, please let me know if I am heading in the right direction. > > (Q3) ) I am planning to create two C++ vectors: d_nnz_vec and o_nnz_vec of N. > Is d_nnz_vec[inactive_row_index] = 0 and o_nnz_vec[inactive_row_index] = 0 or > d_nnz_vec[inactive_row_index] = 1 and Is o_nnz_vec[inactive_row_index] = 0? > > d_nnz_vec[active_row_index] = 10 and o_nnz_vec[active_row_index] = 10 Yes, this seems like the correct scheme, it will not allocate unneeded extra space for inactive rows, exactly the one you want. > > (Q4) Is the below correct? > > PetscInt * d_nnz = NULL; > PetscInt * o_nnz = NULL; > PetscInt max_d_nz = 10; > PetscInt max_o_nz = 10; > > > MatSetType(A, MATAIJ); > MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); > > PetscInt rstart, rend; > MatGetOwnershipRange(A,&rstart,&rend); > > PetscInt local_size = rend ? rstart; > > > PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); > PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); > > for (int i = 0; i < local_size; i++) { > d_nnz[i] = d_nnz_vec[i + rstart]; > o_nnz[i] = o_nnz_vec[i + rstart]; > } > Yes, but I think the + rstart is wrong. It is not needed because the d_nnz and o_nnz are just the size on that MPI rank. > > MatMPIAIJSetPreallocation(A,max_d_nz,d_nnz,max_o_nz, o_nnz); > > PetscFree(d_nnz); > PetscFree(o_nnz); > > > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Wednesday, 8 February 2023 at 21:10 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > On Feb 8, 2023, at 11:19 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > No, I am not calling MatMPIAIJSetPreallocation(... N,NULL); > Here is what I do: > > PetscInt d_nz = 10; > PetscInt o_nz = 10; > > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); > ierr = MatMPIAIJSetPreallocation(A, d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); > > (Q1) > As I am setting the size of A to be N x N via > > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); > > d_nz and o_nz determine the memory reserved for A. So if you use for example 10 that means 10 entries are reserved for every row, even inactive ones. > > > > and pre-allocation is done for ALL rows I would like to understand if the ?inactive rows? are NOT contributing to memory (while using ?redistribute?)? > > Redistribute will drop all the inactive rows from the computation of the solve, it generates a smaller matrix missing all those rows and columns. But the original big matrix with lots of extra reserved space still exists (your A). > > > (Q2) > > I tried solving using hypre within redistribute and system converges to a solution. Is below correct way to use hypre within redistribute? > > ierr = PetscOptionsSetValue(NULL,"-ksp_type", "preonly"); > ierr = PetscOptionsSetValue(NULL,"-pc_type", "redistribute"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_ksp_type", "cg"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "hypre"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_hypre_type", "boomeramg"); > > Correct. You can run with -ksp_view and it will provide all the information about how the solves are layered. > > > > Many thanks, > > Karthik. > > From: Barry Smith > > Date: Tuesday, 7 February 2023 at 19:52 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > > On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry for your detailed response. > > I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). > > The number of elements on each level is a geometric progression. > N - Number elements on each level indexed by ?n? > n - Adaptive mesh level index (starting from 1) > a - Number of elements on the base mesh = 16 > r = 4 (each element is divided into four on the next higher level of refinement) > > N = a r^(n-1) > > The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). > > u = u^(1) + u^(2) + ? + u^(7) > > Hence I have seven system matrix and solution vector pairs, one for each level. > > On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. > As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. > > (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. > > (Q2) After adding 0.0 to all diagonal. I able to solve using either > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); > or > ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); > I was able to solve using hypre as well. > > Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. > > No you do not need redistribute, but for large problems with many empty rows using a solver inside redistribute will be faster than just using that solver directly on the much larger (mostly empty) system. > > > > > (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] > > Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... N,NULL); where N is the number of columns in the matrix? > > If so, yes this causes a huge malloc() by PETSc when it allocates the matrix. It is not scalable. Do you have a small upper bound on the number of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect but much better than N. > > Barry > > > > > > > as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? > > Kind regards, > Karthik. > > > > > From: Barry Smith > > Date: Monday, 6 February 2023 at 22:42 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() > > > > > On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > No problem. I don?t completely follow. > > (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. > > (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? > > Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to > put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. > > (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. > > (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? > > Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. > > When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. > > > > > > > Best, > Karthik. > > From: Barry Smith > > Date: Monday, 6 February 2023 at 20:18 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. > > When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. > > Barry > > > > > > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you very much for offering to debug. > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. > > I have attached three files: > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. > > problem.c ? A simple stand-alone petsc code, which produces almost the same error message. > > petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? > > Please use problem.c to debug the issue. > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Saturday, 4 February 2023 at 00:22 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you can help me reproduce the problem with a simple code I can debug the problem and fix it. > > Barry > > > > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > I updated the main branch to the below commit but the same problem persists. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 > > > From: Barry Smith > > Date: Friday, 3 February 2023 at 18:51 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. > > Barry > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > > > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > > > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > > > > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cesarillovlc at gmail.com Fri Feb 10 15:32:23 2023 From: cesarillovlc at gmail.com (Ces VLC) Date: Fri, 10 Feb 2023 22:32:23 +0100 Subject: [petsc-users] GPUs and the float-double dilemma In-Reply-To: <53CB96CE-0233-4124-BA6C-86AD88C910CF@petsc.dev> References: <53CB96CE-0233-4124-BA6C-86AD88C910CF@petsc.dev> Message-ID: El El vie, 10 feb 2023 a las 21:44, Barry Smith escribi?: > > What is the use case you are looking for that cannot be achieved by > just distributing a single precision application? If the user is happy when > they happen to have GPUs to use single precision everywhere, then why would > they need double precision if they happen not to have a GPU? > > Are you just using KSP or also SNES, TS etc? Thanks for your replies. The use case is structural analysis (so, sparse symmetrical matrix, and minimum degree reorder tends to work fine in CPU (for GPU I?ll need to check the best performing scenarios). Obviously, this use case requires double precision. But single precision might be fine enough for faster low quality runs if the user happens to have a GPU that accelerates float and not double (I have a 12GB Pascal Titan, it accelerates float, not double). Kind regards, C?sar -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Fri Feb 10 16:32:38 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Fri, 10 Feb 2023 22:32:38 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> Message-ID: Yes, but d_nnz_vec and o_nnz_vec are not of size MPI rank but of size N. Below is an alternative approach, making both d_nnz_vec and o_nnz_vec PETSc mpi vecs instead of C++ containers. Is this okay? MatSetType(A, MATAIJ); MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); PetscInt rstart, rend; MatGetOwnershipRange(A,&rstart,&rend); PetscInt local_size = rend ? rstart; VecCreateMPI(PETSC_COMM_WORLD, local_size, N, &d_nnz_vec) VecDuplicate(d_nnz_vec, o_nnz_vec) // populate d_nnz_vec and o_nnz_vec but not shown here! PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); PetscMalloc(local_size * sizeof(PetscInt), &indices); for (int i = 0; i < local_size; i++) indices[i]=i+rstart; VecGetValues(d_nnz_vec, local_size, indices, d_nzz); VecGetValues(o_nnz_vec, local_size, indices, o_nzz); MatMPIAIJSetPreallocation(A, max_d_nz, d_nnz, max_o_nz, o_nnz); PetscFree(d_nnz); PetscFree(o_nnz); Kind regards, Karthik. From: Barry Smith Date: Friday, 10 February 2023 at 20:53 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 10, 2023, at 3:34 PM, Karthikeyan Chockalingam - STFC UKRI wrote: Thank you for your response. ?But the original big matrix with lots of extra reserved space still exists (your A).? - I would like to save memory on rows that are ?inactive? by making use of d_nnz and o_nnz. (Q1) When is a row considered ?inactive? and dropped by -pc_type redistribute during the solve? Is it when the entire row is zero? When there is only a diagonal entry or no entries. (Q2) If the answer to (Q1) is ?yes?. Am I right in thinking it would be beneficial to make have 0.0 on the diagonal of the ?inactive? rows instead of ?1.0?? Doesn't matter, you can have any value on the diagonal. Currently, I am setting 0.0 to the diagonal element of the ?inactive? rows. I am propose the following, please let me know if I am heading in the right direction. (Q3) ) I am planning to create two C++ vectors: d_nnz_vec and o_nnz_vec of N. Is d_nnz_vec[inactive_row_index] = 0 and o_nnz_vec[inactive_row_index] = 0 or d_nnz_vec[inactive_row_index] = 1 and Is o_nnz_vec[inactive_row_index] = 0? d_nnz_vec[active_row_index] = 10 and o_nnz_vec[active_row_index] = 10 Yes, this seems like the correct scheme, it will not allocate unneeded extra space for inactive rows, exactly the one you want. (Q4) Is the below correct? PetscInt * d_nnz = NULL; PetscInt * o_nnz = NULL; PetscInt max_d_nz = 10; PetscInt max_o_nz = 10; MatSetType(A, MATAIJ); MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); PetscInt rstart, rend; MatGetOwnershipRange(A,&rstart,&rend); PetscInt local_size = rend ? rstart; PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); for (int i = 0; i < local_size; i++) { d_nnz[i] = d_nnz_vec[i + rstart]; o_nnz[i] = o_nnz_vec[i + rstart]; } Yes, but I think the + rstart is wrong. It is not needed because the d_nnz and o_nnz are just the size on that MPI rank. MatMPIAIJSetPreallocation(A,max_d_nz,d_nnz,max_o_nz, o_nnz); PetscFree(d_nnz); PetscFree(o_nnz); Kind regards, Karthik. From: Barry Smith > Date: Wednesday, 8 February 2023 at 21:10 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 8, 2023, at 11:19 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: No, I am not calling MatMPIAIJSetPreallocation(... N,NULL); Here is what I do: PetscInt d_nz = 10; PetscInt o_nz = 10; ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); ierr = MatMPIAIJSetPreallocation(A, d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); (Q1) As I am setting the size of A to be N x N via ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); d_nz and o_nz determine the memory reserved for A. So if you use for example 10 that means 10 entries are reserved for every row, even inactive ones. and pre-allocation is done for ALL rows I would like to understand if the ?inactive rows? are NOT contributing to memory (while using ?redistribute?)? Redistribute will drop all the inactive rows from the computation of the solve, it generates a smaller matrix missing all those rows and columns. But the original big matrix with lots of extra reserved space still exists (your A). (Q2) I tried solving using hypre within redistribute and system converges to a solution. Is below correct way to use hypre within redistribute? ierr = PetscOptionsSetValue(NULL,"-ksp_type", "preonly"); ierr = PetscOptionsSetValue(NULL,"-pc_type", "redistribute"); ierr = PetscOptionsSetValue(NULL,"-redistribute_ksp_type", "cg"); ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "hypre"); ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_hypre_type", "boomeramg"); Correct. You can run with -ksp_view and it will provide all the information about how the solves are layered. Many thanks, Karthik. From: Barry Smith > Date: Tuesday, 7 February 2023 at 19:52 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry for your detailed response. I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). The number of elements on each level is a geometric progression. N - Number elements on each level indexed by ?n? n - Adaptive mesh level index (starting from 1) a - Number of elements on the base mesh = 16 r = 4 (each element is divided into four on the next higher level of refinement) N = a r^(n-1) The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). u = u^(1) + u^(2) + ? + u^(7) Hence I have seven system matrix and solution vector pairs, one for each level. On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. (Q2) After adding 0.0 to all diagonal. I able to solve using either ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); or ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); I was able to solve using hypre as well. Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. No you do not need redistribute, but for large problems with many empty rows using a solver inside redistribute will be faster than just using that solver directly on the much larger (mostly empty) system. (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... N,NULL); where N is the number of columns in the matrix? If so, yes this causes a huge malloc() by PETSc when it allocates the matrix. It is not scalable. Do you have a small upper bound on the number of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect but much better than N. Barry as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? Kind regards, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 22:42 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: No problem. I don?t completely follow. (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. Best, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 20:18 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. Barry On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you very much for offering to debug. I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. I have attached three files: petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. problem.c ? A simple stand-alone petsc code, which produces almost the same error message. petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? Please use problem.c to debug the issue. Kind regards, Karthik. From: Barry Smith > Date: Saturday, 4 February 2023 at 00:22 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith > Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Feb 10 17:16:49 2023 From: jed at jedbrown.org (Jed Brown) Date: Fri, 10 Feb 2023 16:16:49 -0700 Subject: [petsc-users] GPUs and the float-double dilemma In-Reply-To: References: <53CB96CE-0233-4124-BA6C-86AD88C910CF@petsc.dev> Message-ID: <87v8k9xg9q.fsf@jedbrown.org> Ces VLC writes: > El El vie, 10 feb 2023 a las 21:44, Barry Smith escribi?: > >> >> What is the use case you are looking for that cannot be achieved by >> just distributing a single precision application? If the user is happy when >> they happen to have GPUs to use single precision everywhere, then why would >> they need double precision if they happen not to have a GPU? >> >> Are you just using KSP or also SNES, TS etc? > > > Thanks for your replies. The use case is structural analysis (so, sparse > symmetrical matrix, and minimum degree reorder tends to work fine in CPU > (for GPU I?ll need to check the best performing scenarios). Sparse direct solvers are rather lacking on GPUs. You may be able to use an algebraic multigrid on GPUs. If you're using an iterative solver, you'll be limited by memory bandwidth, not flops, so double -> float is at best an 8/12 improvement. You may be interested in this work on structural mechanics for GPUs. https://arxiv.org/abs/2204.01722 > Obviously, this use case requires double precision. But single precision > might be fine enough for faster low quality runs if the user happens to > have a GPU that accelerates float and not double (I have a 12GB Pascal > Titan, it accelerates float, not double). > > Kind regards, > > C?sar From bsmith at petsc.dev Fri Feb 10 18:44:57 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 10 Feb 2023 19:44:57 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> Message-ID: <64187791-B57B-4DFE-8B87-52CB8F0DEC86@petsc.dev> > On Feb 10, 2023, at 5:32 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Yes, but d_nnz_vec and o_nnz_vec are not of size MPI rank but of size N. They are each of size local_size not size N. So your old code was fine with some changes, like > for (int i = 0; i < local_size; i++) { > d_nnz[i] = 10 if active row else 1; > o_nnz[i] = 10 if active row else 0; > } That is all you need to define these two arrays. but you don't need the > > Below is an alternative approach, making both d_nnz_vec and o_nnz_vec PETSc mpi vecs instead of C++ containers. > > Is this okay? > > MatSetType(A, MATAIJ); > MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); > > PetscInt rstart, rend; > MatGetOwnershipRange(A,&rstart,&rend); > > PetscInt local_size = rend ? rstart; > > VecCreateMPI(PETSC_COMM_WORLD, local_size, N, &d_nnz_vec) > VecDuplicate(d_nnz_vec, o_nnz_vec) > > // populate d_nnz_vec and o_nnz_vec but not shown here! > > PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); > PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); > PetscMalloc(local_size * sizeof(PetscInt), &indices); > > > for (int i = 0; i < local_size; i++) > indices[i]=i+rstart; > > VecGetValues(d_nnz_vec, local_size, indices, d_nzz); > VecGetValues(o_nnz_vec, local_size, indices, o_nzz); > > > MatMPIAIJSetPreallocation(A, max_d_nz, d_nnz, max_o_nz, o_nnz); > > PetscFree(d_nnz); > PetscFree(o_nnz); > > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Friday, 10 February 2023 at 20:53 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > On Feb 10, 2023, at 3:34 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you for your response. > > ?But the original big matrix with lots of extra reserved space still exists (your A).? - I would like to save memory on rows that are ?inactive? by making use of d_nnz and o_nnz. > > (Q1) When is a row considered ?inactive? and dropped by -pc_type redistribute during the solve? Is it when the entire row is zero? > > When there is only a diagonal entry or no entries. > > > (Q2) If the answer to (Q1) is ?yes?. Am I right in thinking it would be beneficial to make have 0.0 on the diagonal of the ?inactive? rows instead of ?1.0?? > > Doesn't matter, you can have any value on the diagonal. > > > Currently, I am setting 0.0 to the diagonal element of the ?inactive? rows. > I am propose the following, please let me know if I am heading in the right direction. > > (Q3) ) I am planning to create two C++ vectors: d_nnz_vec and o_nnz_vec of N. > Is d_nnz_vec[inactive_row_index] = 0 and o_nnz_vec[inactive_row_index] = 0 or > d_nnz_vec[inactive_row_index] = 1 and Is o_nnz_vec[inactive_row_index] = 0? > > d_nnz_vec[active_row_index] = 10 and o_nnz_vec[active_row_index] = 10 > > Yes, this seems like the correct scheme, it will not allocate unneeded extra space for inactive rows, exactly the one you want. > > > (Q4) Is the below correct? > > PetscInt * d_nnz = NULL; > PetscInt * o_nnz = NULL; > PetscInt max_d_nz = 10; > PetscInt max_o_nz = 10; > > > MatSetType(A, MATAIJ); > MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); > > PetscInt rstart, rend; > MatGetOwnershipRange(A,&rstart,&rend); > > PetscInt local_size = rend ? rstart; > > > PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); > PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); > > for (int i = 0; i < local_size; i++) { > d_nnz[i] = d_nnz_vec[i + rstart]; > o_nnz[i] = o_nnz_vec[i + rstart]; > } > > > Yes, but I think the + rstart is wrong. It is not needed because the d_nnz and o_nnz are just the size on that MPI rank. > > > > MatMPIAIJSetPreallocation(A,max_d_nz,d_nnz,max_o_nz, o_nnz); > > PetscFree(d_nnz); > PetscFree(o_nnz); > > > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Wednesday, 8 February 2023 at 21:10 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > > On Feb 8, 2023, at 11:19 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > No, I am not calling MatMPIAIJSetPreallocation(... N,NULL); > Here is what I do: > > PetscInt d_nz = 10; > PetscInt o_nz = 10; > > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); > ierr = MatMPIAIJSetPreallocation(A, d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); > > (Q1) > As I am setting the size of A to be N x N via > > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); > > d_nz and o_nz determine the memory reserved for A. So if you use for example 10 that means 10 entries are reserved for every row, even inactive ones. > > > > > and pre-allocation is done for ALL rows I would like to understand if the ?inactive rows? are NOT contributing to memory (while using ?redistribute?)? > > Redistribute will drop all the inactive rows from the computation of the solve, it generates a smaller matrix missing all those rows and columns. But the original big matrix with lots of extra reserved space still exists (your A). > > > > (Q2) > > I tried solving using hypre within redistribute and system converges to a solution. Is below correct way to use hypre within redistribute? > > ierr = PetscOptionsSetValue(NULL,"-ksp_type", "preonly"); > ierr = PetscOptionsSetValue(NULL,"-pc_type", "redistribute"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_ksp_type", "cg"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "hypre"); > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_hypre_type", "boomeramg"); > > Correct. You can run with -ksp_view and it will provide all the information about how the solves are layered. > > > > > Many thanks, > > Karthik. > > From: Barry Smith > > Date: Tuesday, 7 February 2023 at 19:52 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > > On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry for your detailed response. > > I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). > > The number of elements on each level is a geometric progression. > N - Number elements on each level indexed by ?n? > n - Adaptive mesh level index (starting from 1) > a - Number of elements on the base mesh = 16 > r = 4 (each element is divided into four on the next higher level of refinement) > > N = a r^(n-1) > > The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). > > u = u^(1) + u^(2) + ? + u^(7) > > Hence I have seven system matrix and solution vector pairs, one for each level. > > On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. > As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. > > (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. > > (Q2) After adding 0.0 to all diagonal. I able to solve using either > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); > or > ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); > I was able to solve using hypre as well. > > Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. > > No you do not need redistribute, but for large problems with many empty rows using a solver inside redistribute will be faster than just using that solver directly on the much larger (mostly empty) system. > > > > > (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] > > Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... N,NULL); where N is the number of columns in the matrix? > > If so, yes this causes a huge malloc() by PETSc when it allocates the matrix. It is not scalable. Do you have a small upper bound on the number of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect but much better than N. > > Barry > > > > > > > as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? > > Kind regards, > Karthik. > > > > > From: Barry Smith > > Date: Monday, 6 February 2023 at 22:42 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() > > > > > On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > No problem. I don?t completely follow. > > (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. > > (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? > > Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to > put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. > > (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. > > (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? > > Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. > > When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. > > > > > > > Best, > Karthik. > > From: Barry Smith > > Date: Monday, 6 February 2023 at 20:18 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. > > When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. > > Barry > > > > > > > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you very much for offering to debug. > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. > > I have attached three files: > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. > > problem.c ? A simple stand-alone petsc code, which produces almost the same error message. > > petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? > > Please use problem.c to debug the issue. > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Saturday, 4 February 2023 at 00:22 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you can help me reproduce the problem with a simple code I can debug the problem and fix it. > > Barry > > > > > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > I updated the main branch to the below commit but the same problem persists. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 > > > From: Barry Smith > > Date: Friday, 3 February 2023 at 18:51 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. > > Barry > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 > [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > Residual norms for redistribute_ solve. > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > KSP Object: (redistribute_) 1 MPI process > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (redistribute_) 1 MPI process > type: jacobi > type DIAGONAL > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: mpiaij > rows=48896, cols=48896 > total: nonzeros=307976, allocated nonzeros=307976 > total number of mallocs used during MatSetValues calls=0 > not using I-node (on process 0) routines > End of program > solve time 0.564714744 seconds > Starting max value is: 0 > Min value of level 0 is: 0 > Interpolated min value is: 741.978761 > Unused ParmParse Variables: > [TOP]::model.type(nvals = 1) :: [3] > [TOP]::ref_ratio(nvals = 1) :: [2] > > AMReX (22.10-20-g3082028e4287) finalized > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > > Best, > Karthik. > > From: Barry Smith > > Date: Friday, 3 February 2023 at 17:41 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > We need all the error output for the errors you got below to understand why the errors are happening. > > > > > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello Barry, > > I would like to better understand pc_type redistribute usage. > > I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. > > Indeed the linear solve will be more efficient if you use the redistribute solver. > > But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). > > > > > > > > > > Question (1) > > > If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? > > See above > > > > > > > > > > Question (2) > > > I tried using pc_type redistribute for a two level system. > I have attached the output only for (level 1) > The solution converges to right solution but still petsc outputs some error messages. > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-options_left (no value) > > But the there were no unused options > > #PETSc Option Table entries: > -ksp_type preonly > -options_left > -pc_type redistribute > -redistribute_ksp_converged_reason > -redistribute_ksp_monitor_true_residual > -redistribute_ksp_type cg > -redistribute_ksp_view > -redistribute_pc_type jacobi > #End of PETSc Option Table entries > There are no unused options. > Program ended with exit code: 0 > > I cannot explain this > > > > > > > > > Question (2) > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? > > > Question (3) > > > > > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. > > Thank you for your help. > > Kind regards, > Karthik. > > > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 18:50 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. > > Barry > > > > > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thank you Barry. This is great! > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? > > Kind regards, > Karthik. > > From: Barry Smith > > Date: Tuesday, 10 January 2023 at 16:04 > To: Chockalingam, Karthikeyan (STFC,DL,HC) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute > > > It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. > > Barry > > > > > > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: > > Hello, > > I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. > > For instance say N=7 > > 0 0 0 0 0 0 0 > 0 1 -1 0 0 0 0 > 0 -1 2 0 0 0 -1 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 > 0 0 -1 0 0 0 1 > > I would like to reduce it to a 3x3 > > 1 -1 0 > -1 2 -1 > 0 -1 1 > > I do know the size N. > > Q1) How do I do it? > Q2) Is it better to eliminate them as it would save a lot of memory? > Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? > > Thank you. > > Karthik. > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Sun Feb 12 17:15:21 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sun, 12 Feb 2023 17:15:21 -0600 Subject: [petsc-users] PetscViewer with 64bit Message-ID: Dear PETSc team, I am a user of PETSc with Fortran. My code uses DMPlex to handle dm object. To print out output variable and mesh connectivity, I use VecView() by defining PetscSection on that dm and borrow a vector. The type of the viewer is set to PETSCVIEWERVTK. With 32bit indices, the above work flow has no issue. However, if PETSc is configured with 64bit indices, my output .vtu file has an error if I open the file with visualization tools, such as Paraview or Tecplot, saying that: "Cannot read cell connectivity from Cells in piece 0 because the "offsets" array is not monotonically increasing or starts with a value other than 0." If I open the .vtu file from terminal, I can see such a line: ... ... I expected "DataArray type="Int64", since the PETSc has 64bit indices. Could I get recommendations that I need to check to resolve the issue? Thanks, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 12 17:28:24 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 12 Feb 2023 18:28:24 -0500 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: Message-ID: On Sun, Feb 12, 2023 at 6:15 PM Mike Michell wrote: > Dear PETSc team, > > I am a user of PETSc with Fortran. My code uses DMPlex to handle dm > object. To print out output variable and mesh connectivity, I use VecView() > by defining PetscSection on that dm and borrow a vector. The type of the > viewer is set to PETSCVIEWERVTK. > > With 32bit indices, the above work flow has no issue. However, if PETSc is > configured with 64bit indices, my output .vtu file has an error if I open > the file with visualization tools, such as Paraview or Tecplot, saying > that: > "Cannot read cell connectivity from Cells in piece 0 because the "offsets" > array is not monotonically increasing or starts with a value other than 0." > > If I open the .vtu file from terminal, I can see such a line: > ... > format="appended" offset="580860" /> > ... > > I expected "DataArray type="Int64", since the PETSc has 64bit indices. > Could I get recommendations that I need to check to resolve the issue? > This is probably a bug. We will look at it. Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt) is used to calculate the offset, which looks inconsistent. Can you take a look? Thanks, Matt > Thanks, > Mike > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Sun Feb 12 17:38:14 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sun, 12 Feb 2023 17:38:14 -0600 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: Message-ID: Thanks for the comments. To be precise on the question, the entire part of the header of the .vtu file is attached: Thanks, Mike > On Sun, Feb 12, 2023 at 6:15 PM Mike Michell > wrote: > >> Dear PETSc team, >> >> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm >> object. To print out output variable and mesh connectivity, I use VecView() >> by defining PetscSection on that dm and borrow a vector. The type of the >> viewer is set to PETSCVIEWERVTK. >> >> With 32bit indices, the above work flow has no issue. However, if PETSc >> is configured with 64bit indices, my output .vtu file has an error if I >> open the file with visualization tools, such as Paraview or Tecplot, saying >> that: >> "Cannot read cell connectivity from Cells in piece 0 because the >> "offsets" array is not monotonically increasing or starts with a value >> other than 0." >> >> If I open the .vtu file from terminal, I can see such a line: >> ... >> > format="appended" offset="580860" /> >> ... >> >> I expected "DataArray type="Int64", since the PETSc has 64bit indices. >> Could I get recommendations that I need to check to resolve the issue? >> > > This is probably a bug. We will look at it. > > Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt) is > used to calculate the offset, which looks inconsistent. Can you take a look? > > Thanks, > > Matt > > >> Thanks, >> Mike >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Mon Feb 13 05:24:12 2023 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Mon, 13 Feb 2023 11:24:12 +0000 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: <64187791-B57B-4DFE-8B87-52CB8F0DEC86@petsc.dev> References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> <64187791-B57B-4DFE-8B87-52CB8F0DEC86@petsc.dev> Message-ID: Thank you. I have it working now. (Q1) When both d_nz and d_nnz are specified ? would petsc would pick up d_nnz over d_nz? Likewise for o_nz and o_nnz? ierr = MatMPIAIJSetPreallocation(A,d_nz, d_nnz, o_nz, o_nnz); (Q2) Finally I would like to assign appropriate memory for ?active rows? as well. I know the max non-zero elements in a row is 9? Is there a rule-of-thumb what d_nz and o_nz be for active rows - when run parallelly? Best, Karthik. From: Barry Smith Date: Saturday, 11 February 2023 at 00:45 To: Chockalingam, Karthikeyan (STFC,DL,HC) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 10, 2023, at 5:32 PM, Karthikeyan Chockalingam - STFC UKRI wrote: Yes, but d_nnz_vec and o_nnz_vec are not of size MPI rank but of size N. They are each of size local_size not size N. So your old code was fine with some changes, like for (int i = 0; i < local_size; i++) { d_nnz[i] = 10 if active row else 1; o_nnz[i] = 10 if active row else 0; } That is all you need to define these two arrays. but you don't need the Below is an alternative approach, making both d_nnz_vec and o_nnz_vec PETSc mpi vecs instead of C++ containers. Is this okay? MatSetType(A, MATAIJ); MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); PetscInt rstart, rend; MatGetOwnershipRange(A,&rstart,&rend); PetscInt local_size = rend ? rstart; VecCreateMPI(PETSC_COMM_WORLD, local_size, N, &d_nnz_vec) VecDuplicate(d_nnz_vec, o_nnz_vec) // populate d_nnz_vec and o_nnz_vec but not shown here! PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); PetscMalloc(local_size * sizeof(PetscInt), &indices); for (int i = 0; i < local_size; i++) indices[i]=i+rstart; VecGetValues(d_nnz_vec, local_size, indices, d_nzz); VecGetValues(o_nnz_vec, local_size, indices, o_nzz); MatMPIAIJSetPreallocation(A, max_d_nz, d_nnz, max_o_nz, o_nnz); PetscFree(d_nnz); PetscFree(o_nnz); Kind regards, Karthik. From: Barry Smith > Date: Friday, 10 February 2023 at 20:53 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 10, 2023, at 3:34 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you for your response. ?But the original big matrix with lots of extra reserved space still exists (your A).? - I would like to save memory on rows that are ?inactive? by making use of d_nnz and o_nnz. (Q1) When is a row considered ?inactive? and dropped by -pc_type redistribute during the solve? Is it when the entire row is zero? When there is only a diagonal entry or no entries. (Q2) If the answer to (Q1) is ?yes?. Am I right in thinking it would be beneficial to make have 0.0 on the diagonal of the ?inactive? rows instead of ?1.0?? Doesn't matter, you can have any value on the diagonal. Currently, I am setting 0.0 to the diagonal element of the ?inactive? rows. I am propose the following, please let me know if I am heading in the right direction. (Q3) ) I am planning to create two C++ vectors: d_nnz_vec and o_nnz_vec of N. Is d_nnz_vec[inactive_row_index] = 0 and o_nnz_vec[inactive_row_index] = 0 or d_nnz_vec[inactive_row_index] = 1 and Is o_nnz_vec[inactive_row_index] = 0? d_nnz_vec[active_row_index] = 10 and o_nnz_vec[active_row_index] = 10 Yes, this seems like the correct scheme, it will not allocate unneeded extra space for inactive rows, exactly the one you want. (Q4) Is the below correct? PetscInt * d_nnz = NULL; PetscInt * o_nnz = NULL; PetscInt max_d_nz = 10; PetscInt max_o_nz = 10; MatSetType(A, MATAIJ); MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); PetscInt rstart, rend; MatGetOwnershipRange(A,&rstart,&rend); PetscInt local_size = rend ? rstart; PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); for (int i = 0; i < local_size; i++) { d_nnz[i] = d_nnz_vec[i + rstart]; o_nnz[i] = o_nnz_vec[i + rstart]; } Yes, but I think the + rstart is wrong. It is not needed because the d_nnz and o_nnz are just the size on that MPI rank. MatMPIAIJSetPreallocation(A,max_d_nz,d_nnz,max_o_nz, o_nnz); PetscFree(d_nnz); PetscFree(o_nnz); Kind regards, Karthik. From: Barry Smith > Date: Wednesday, 8 February 2023 at 21:10 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 8, 2023, at 11:19 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: No, I am not calling MatMPIAIJSetPreallocation(... N,NULL); Here is what I do: PetscInt d_nz = 10; PetscInt o_nz = 10; ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); ierr = MatMPIAIJSetPreallocation(A, d_nz, NULL, o_nz, NULL); CHKERRQ(ierr); (Q1) As I am setting the size of A to be N x N via ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ(ierr); d_nz and o_nz determine the memory reserved for A. So if you use for example 10 that means 10 entries are reserved for every row, even inactive ones. and pre-allocation is done for ALL rows I would like to understand if the ?inactive rows? are NOT contributing to memory (while using ?redistribute?)? Redistribute will drop all the inactive rows from the computation of the solve, it generates a smaller matrix missing all those rows and columns. But the original big matrix with lots of extra reserved space still exists (your A). (Q2) I tried solving using hypre within redistribute and system converges to a solution. Is below correct way to use hypre within redistribute? ierr = PetscOptionsSetValue(NULL,"-ksp_type", "preonly"); ierr = PetscOptionsSetValue(NULL,"-pc_type", "redistribute"); ierr = PetscOptionsSetValue(NULL,"-redistribute_ksp_type", "cg"); ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "hypre"); ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_hypre_type", "boomeramg"); Correct. You can run with -ksp_view and it will provide all the information about how the solves are layered. Many thanks, Karthik. From: Barry Smith > Date: Tuesday, 7 February 2023 at 19:52 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry for your detailed response. I would like to shed some light into what I try to accomplish using PETSc and AMReX. Please see the attachment adaptive mesh image (and ignore the mesh-order legend for now). The number of elements on each level is a geometric progression. N - Number elements on each level indexed by ?n? n - Adaptive mesh level index (starting from 1) a - Number of elements on the base mesh = 16 r = 4 (each element is divided into four on the next higher level of refinement) N = a r^(n-1) The final solution u, is the super imposition of solutions from all levels (here we have a total of 7 levels). u = u^(1) + u^(2) + ? + u^(7) Hence I have seven system matrix and solution vector pairs, one for each level. On each level the element index vary from 1 to N. But on each level NOT all elements are ?active?. As you can see from the attached image not all elements are active (a lot of white hollow spaces). So the ?active? indexes can be scatted anywhere between 1 to N = 65536 for n = 7. (Q1) In my case, I can?t at the moment insert 1 on the diagonal because during assembly I am using ADD_VALUES as a node can be common to many elements. So I added 0.0 to ALL diagonals. After global assembly, I find that the linear solver converges. (Q2) After adding 0.0 to all diagonal. I able to solve using either ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); CHKERRQ(ierr); or ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); I was able to solve using hypre as well. Do I need to use -pc_type redistribute or not? Because I am able to solve without it as well. No you do not need redistribute, but for large problems with many empty rows using a solver inside redistribute will be faster than just using that solver directly on the much larger (mostly empty) system. (Q3) I am sorry, if I sound like a broken record player. On each level I request allocation for A[N][N] Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... N,NULL); where N is the number of columns in the matrix? If so, yes this causes a huge malloc() by PETSc when it allocates the matrix. It is not scalable. Do you have a small upper bound on the number of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect but much better than N. Barry as the indexes can be scatted anywhere between 1 to N but most are ?inactive rows?. Is -pc_type redistribute the way to go for me to save on memory? Though I request A[N][N] allocation, and not all rows are active - I wonder if I am wasting a huge amount of memory? Kind regards, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 22:42 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry was not clear MatZero*. I just meant MatZeroRows() or MatZeroRowsColumns() On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: No problem. I don?t completely follow. (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what it does? And its relevance to my problem. (Q2) Since I am creating a MATMPIAJI system? what would be the best way to insert 0.0 in to ALL diagonals (both active and inactive rows) to begin with? Yes, just have each MPI process loop over its rows and put zero on the diagonal (actually, you could put a 1 if you want). Then have your code use AMReX to put all its values in, I am assuming the code uses INSERT_VALUES so it will always overwrite the value you put in initially (hence putting in 1 initially will be fine; the advantage of 1 is if you do not use PCREDISTIBUTE the matrix is fully defined and so any solver will work. If you know the inactive rows you can just put the diagonal on those since AMReX will fill up the rest of the rows, but it is harmless to put values on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling the diagonal entries and having AMReX put in its values. (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after I have put values into the matrix would be an effort. Unless there is a straight forward to do it in PETSc. (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve would eliminate those rows? Well no solver will really make sense if you have "inactive" rows, that is rows with nothing in them except PCREDISTIBUTE. When PETSc was written we didn't understand having lots of completely empty rows was a use case so much of the functionality does not work in that case. Best, Karthik. From: Barry Smith > Date: Monday, 6 February 2023 at 20:18 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports completely empty rows but MatZero* does not. When you put values into the matrix you will need to insert a 0.0 on the diagonal of each "inactive" row; all of this will be eliminated during the linear solve process. It would be a major project to change the MatZero* functions to handle empty rows. Barry On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you very much for offering to debug. I built PETSc along with AMReX, so I tried to extract the PETSc code alone which would reproduce the same error on the smallest sized problem possible. I have attached three files: petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc interface, but THE linear system solves and converges to a solution. problem.c ? A simple stand-alone petsc code, which produces almost the same error message. petsc_ error_redistribute.txt ? The error message from problem.c but strangely it does NOT solve ? I am not sure why? Please use problem.c to debug the issue. Kind regards, Karthik. From: Barry Smith > Date: Saturday, 4 February 2023 at 00:22 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you can help me reproduce the problem with a simple code I can debug the problem and fix it. Barry On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: I updated the main branch to the below commit but the same problem persists. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000 From: Barry Smith > Date: Friday, 3 February 2023 at 18:51 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros If you switch to use the main branch of petsc https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you will not have the problem below (previously we required that a row exist before we zeroed it but now we allow the row to initially have no entries and still be zeroed. Barry On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you. The entire error output was an attachment in my previous email. I am pasting here for your reference. [1;31m[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda GIT Date: 2022-10-30 11:03:25 -0500 [0]PETSC ERROR: /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 [0]PETSC ERROR: Configure options --with-debugging=0 --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 [0]PETSC ERROR: #2 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 [0]PETSC ERROR: #4 MatZeroRowsColumns() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 [0]PETSC ERROR: #6 localAssembly() at /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 Residual norms for redistribute_ solve. 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 KSP Object: (redistribute_) 1 MPI process type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (redistribute_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: mpiaij rows=48896, cols=48896 total: nonzeros=307976, allocated nonzeros=307976 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines End of program solve time 0.564714744 seconds Starting max value is: 0 Min value of level 0 is: 0 Interpolated min value is: 741.978761 Unused ParmParse Variables: [TOP]::model.type(nvals = 1) :: [3] [TOP]::ref_ratio(nvals = 1) :: [2] AMReX (22.10-20-g3082028e4287) finalized #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 Best, Karthik. From: Barry Smith > Date: Friday, 3 February 2023 at 17:41 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros We need all the error output for the errors you got below to understand why the errors are happening. On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello Barry, I would like to better understand pc_type redistribute usage. I am plan to use pc_type redistribute in the context of adaptive mesh refinement on a structured grid in 2D. My base mesh (level 0) is indexed from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to 4(N-1) elements. When I construct system matrix A on (level 1); I probably only use 20% of 4(N-1) elements, however the indexes are scattered in the range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system matrix A on (level 1) to be zero. From your earlier response, I believe this would be a use case for petsc_type redistribute. Indeed the linear solve will be more efficient if you use the redistribute solver. But I don't understand your plan. With adaptive refinement I would just create the two matrices, one for the initial grid on which you solve the system, this will be a smaller matrix and then create a new larger matrix for the refined grid (and discard the previous matrix). Question (1) If N is really large, I would have to allocate memory of size 4(N-1) for the system matrix A on (level 1). How does pc_type redistribute help? Because, I did end up allocating memory for a large system, where most of the rows and columns are zeros. Is most of the allotted memory not wasted? Is this the correct usage? See above Question (2) I tried using pc_type redistribute for a two level system. I have attached the output only for (level 1) The solution converges to right solution but still petsc outputs some error messages. [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-options_left (no value) But the there were no unused options #PETSc Option Table entries: -ksp_type preonly -options_left -pc_type redistribute -redistribute_ksp_converged_reason -redistribute_ksp_monitor_true_residual -redistribute_ksp_type cg -redistribute_ksp_view -redistribute_pc_type jacobi #End of PETSc Option Table entries There are no unused options. Program ended with exit code: 0 I cannot explain this Question (2) [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) What does this error message imply? Given I only use 20% of 4(N-1) indexes, I can imagine most of the diagonal entrees are zero. Is my understanding correct? Question (3) [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. I don?t follow why I get this error message as the linear system converges to the right solution. Thank you for your help. Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 18:50 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros Yes, after the solve the x will contain correct values for ALL the locations including the (zeroed out rows). You use case is exactly what redistribute it for. Barry On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Barry. This is great! I plan to solve using ?-pc_type redistribute? after applying the Dirichlet bc using MatZeroRowsColumnsIS(A, isout, 1, x, b); While I retrieve the solution data from x (after the solve) ? can I index them using the original ordering (if I may say that)? Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 10 January 2023 at 16:04 To: Chockalingam, Karthikeyan (STFC,DL,HC) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Eliminating rows and columns which are zeros https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute -pc_type redistribute It does everything for you. Note that if the right hand side for any of the "zero" rows is nonzero then the system is inconsistent and the system does not have a solution. Barry On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via petsc-users > wrote: Hello, I am assembling a MATIJ of size N, where a very large number of rows (and corresponding columns), are zeros. I would like to potentially eliminate them before the solve. For instance say N=7 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 -1 2 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 1 I would like to reduce it to a 3x3 1 -1 0 -1 2 -1 0 -1 1 I do know the size N. Q1) How do I do it? Q2) Is it better to eliminate them as it would save a lot of memory? Q3) At the moment, I don?t know which rows (and columns) have the zero entries but with some effort I probably can find them. Should I know which rows (and columns) I am eliminating? Thank you. Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 13 05:40:58 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 13 Feb 2023 06:40:58 -0500 Subject: [petsc-users] Eliminating rows and columns which are zeros In-Reply-To: References: <0CD7067A-7470-426A-A8A0-A313DAE01116@petsc.dev> <35478A02-D37B-44F9-83C7-DDBEAEEDEEEB@petsc.dev> <20AF4E62-7E22-4B99-8DC4-600C79F78D96@petsc.dev> <1D7C0055-12B7-4775-B71C-EB4C94D096D9@petsc.dev> <28738507-0571-4B5E-BA4E-1AFDD892860D@petsc.dev> <64187791-B57B-4DFE-8B87-52CB8F0DEC86@petsc.dev> Message-ID: On Mon, Feb 13, 2023 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI via petsc-users wrote: > Thank you. I have it working now. > > > > (Q1) When both d_nz and d_nnz are specified ? would petsc would pick up > d_nnz over d_nz? Likewise for o_nz and o_nnz? > > ierr = MatMPIAIJSetPreallocation(A,d_nz, d_nnz, o_nz, o_nnz); > Yes, it will pick d_nnz. > (Q2) Finally I would like to assign appropriate memory for ?active rows? > as well. I know the max non-zero elements in a row is 9? Is there a > rule-of-thumb what d_nz and o_nz be for active rows - when run parallelly? > No, it would depend on your partitioning. Thanks, Matt > Best, > > Karthik. > > > > > > *From: *Barry Smith > *Date: *Saturday, 11 February 2023 at 00:45 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > > On Feb 10, 2023, at 5:32 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Yes, but d_nnz_vec and o_nnz_vec are not of size MPI rank but of size N. > > > > They are each of size local_size not size N. So your old code was fine > with some changes, like > > > > > > for (int i = 0; i < local_size; i++) { > > d_nnz[i] = 10 if active row else 1; > > o_nnz[i] = 10 if active row else 0; > > } > > > > That is all you need to define these two arrays. > > > > > > but you don't need the > > > > Below is an alternative approach, making both d_nnz_vec and o_nnz_vec > PETSc mpi vecs instead of C++ containers. > > > > Is this okay? > > > > MatSetType(A, MATAIJ); > > MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); > > > > PetscInt rstart, rend; > > MatGetOwnershipRange(A,&rstart,&rend); > > > > PetscInt local_size = rend ? rstart; > > > > VecCreateMPI(PETSC_COMM_WORLD, local_size, N, &d_nnz_vec) > > VecDuplicate(d_nnz_vec, o_nnz_vec) > > > > // populate d_nnz_vec and o_nnz_vec but not shown here! > > > > PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); > > PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); > > PetscMalloc(local_size * sizeof(PetscInt), &indices); > > > > > > for (int i = 0; i < local_size; i++) > > indices[i]=i+rstart; > > > > VecGetValues(d_nnz_vec, local_size, indices, d_nzz); > > VecGetValues(o_nnz_vec, local_size, indices, o_nzz); > > > > > > MatMPIAIJSetPreallocation(A, max_d_nz, d_nnz, max_o_nz, o_nnz); > > > > PetscFree(d_nnz); > > PetscFree(o_nnz); > > > > > > Kind regards, > > Karthik. > > > > > > *From: *Barry Smith > *Date: *Friday, 10 February 2023 at 20:53 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > > > On Feb 10, 2023, at 3:34 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you for your response. > > > > ?But the original big matrix with lots of extra reserved space still > exists (your A).? - I would like to save memory on rows that are > ?inactive? by making use of d_nnz and o_nnz. > > > > (Q1) When is a row considered ?inactive? and dropped by -pc_type > redistribute during the solve? Is it when the entire row is zero? > > > > When there is only a diagonal entry or no entries. > > > > > (Q2) If the answer to (Q1) is ?yes?. Am I right in thinking it would be > beneficial to make have 0.0 on the diagonal of the ?inactive? rows instead > of ?1.0?? > > > > Doesn't matter, you can have any value on the diagonal. > > > > > Currently, I am setting 0.0 to the diagonal element of the ?inactive? rows. > > I am propose the following, please let me know if I am heading in the > right direction. > > > > (Q3) ) I am planning to create two C++ vectors: d_nnz_vec and o_nnz_vec of > N. > > Is d_nnz_vec[inactive_row_index] = 0 and o_nnz_vec[inactive_row_index] = 0 > or > > d_nnz_vec[inactive_row_index] = 1 and Is o_nnz_vec[inactive_row_index] = 0? > > > > d_nnz_vec[active_row_index] = 10 and o_nnz_vec[active_row_index] = 10 > > > > Yes, this seems like the correct scheme, it will not allocate unneeded > extra space for inactive rows, exactly the one you want. > > > > > (Q4) Is the below correct? > > > > PetscInt * d_nnz = NULL; > > PetscInt * o_nnz = NULL; > > PetscInt max_d_nz = 10; > > PetscInt max_o_nz = 10; > > > > > > MatSetType(A, MATAIJ); > > MatSetSizes(A, PETSC_DECIDE,PETSC_DECIDE,N,N); > > > > PetscInt rstart, rend; > > MatGetOwnershipRange(A,&rstart,&rend); > > > > PetscInt local_size = rend ? rstart; > > > > > > PetscMalloc(local_size * sizeof(PetscInt), &d_nnz); > > PetscMalloc(local_size * sizeof(PetscInt), &o_nnz); > > > > for (int i = 0; i < local_size; i++) { > > d_nnz[i] = d_nnz_vec[i + rstart]; > > o_nnz[i] = o_nnz_vec[i + rstart]; > > } > > > > > > Yes, but I think the + rstart is wrong. It is not needed because the > d_nnz and o_nnz are just the size on that MPI rank. > > > > > > > MatMPIAIJSetPreallocation(A,max_d_nz,d_nnz,max_o_nz, o_nnz); > > > > PetscFree(d_nnz); > > PetscFree(o_nnz); > > > > > > > > Kind regards, > > Karthik. > > > > > > > > *From: *Barry Smith > *Date: *Wednesday, 8 February 2023 at 21:10 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > > On Feb 8, 2023, at 11:19 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > No, I am not calling MatMPIAIJSetPreallocation(... N,NULL); > > Here is what I do: > > > > PetscInt d_nz = 10; > > PetscInt o_nz = 10; > > > > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > > ierr = MatSetType(A, MATMPIAIJ); CHKERRQ(ierr); > > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ > (ierr); > > ierr = MatMPIAIJSetPreallocation(A, d_nz, *NULL*, o_nz, *NULL*); > CHKERRQ(ierr); > > > > (Q1) > > As I am setting the size of A to be N x N via > > > > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); CHKERRQ > (ierr); > > > > d_nz and o_nz determine the memory reserved for A. So if you use for > example 10 that means 10 entries are reserved for every row, even inactive > ones. > > > > > > and pre-allocation is done for ALL rows I would like to understand if the > ?inactive rows? are NOT contributing to memory (while using ?redistribute?)? > > > > Redistribute will drop all the inactive rows from the computation of the > solve, it generates a smaller matrix missing all those rows and columns. > But the original big matrix with lots of extra reserved space still exists > (your A). > > > > (Q2) > > > > I tried solving using hypre within redistribute and system converges to a > solution. Is below correct way to use hypre within redistribute? > > > > ierr = PetscOptionsSetValue(*NULL*,"-ksp_type", "preonly"); > > ierr = PetscOptionsSetValue(*NULL*,"-pc_type", "redistribute"); > > ierr = PetscOptionsSetValue(*NULL*,"-redistribute_ksp_type", "cg"); > > ierr = PetscOptionsSetValue(*NULL*,"-redistribute_pc_type", "hypre"); > > ierr = PetscOptionsSetValue(*NULL*,"-redistribute_pc_hypre_type", > "boomeramg"); > > > > Correct. You can run with -ksp_view and it will provide all the > information about how the solves are layered. > > > > > > Many thanks, > > > > Karthik. > > > > *From: *Barry Smith > *Date: *Tuesday, 7 February 2023 at 19:52 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > > On Feb 7, 2023, at 1:20 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you Barry for your detailed response. > > > > I would like to shed some light into what I try to accomplish using PETSc > and AMReX. Please see the *attachment adaptive mesh image* (and ignore > the mesh-order legend for now). > > > > The number of elements on each level is a geometric progression. > > N - Number elements on each level indexed by ?n? > > n - Adaptive mesh level index (starting from 1) > > a - Number of elements on the base mesh = 16 > > r = 4 (each element is divided into four on the next higher level of > refinement) > > > > N = a r^(n-1) > > > > The final solution u, is the super imposition of solutions from all levels > (here we have a total of 7 levels). > > > > u = u^(1) + u^(2) + ? + u^(7) > > > > Hence I have seven system matrix and solution vector pairs, one for each > level. > > > > On each level the element index vary from 1 to N. But on each level NOT > all elements are ?active?. > > As you can see from the attached image not all elements are active (*a > lot of white hollow spaces*). So the ?active? indexes can be scatted > anywhere between 1 to N = 65536 for n = 7. > > > > (Q1) In my case, I can?t at the moment insert 1 on the diagonal because > during assembly I am using ADD_VALUES as a node can be common to many > elements. So I added 0.0 to ALL diagonals. After global assembly, I find > that the linear solver converges. > > > > (Q2) After adding 0.0 to all diagonal. I able to solve using either > > ierr = PetscOptionsSetValue(NULL,"-redistribute_pc_type", "jacobi"); > CHKERRQ(ierr); > > or > > ierr = PetscOptionsSetValue(NULL," pc_type", "jacobi"); CHKERRQ(ierr); > > I was able to solve using hypre as well. > > > > Do I need to use -pc_type redistribute or not? Because I am able to solve > without it as well. > > > > No you do not need redistribute, but for large problems with many empty > rows using a solver inside redistribute will be faster than just using that > solver directly on the much larger (mostly empty) system. > > > > > > (Q3) I am sorry, if I sound like a broken record player. On each level I > request allocation for A[N][N] > > > > Not sure what you mean by this? Call MatMPIAIJSetPreallocation(... > N,NULL); where N is the number of columns in the matrix? > > > > If so, yes this causes a huge malloc() by PETSc when it allocates the > matrix. It is not scalable. Do you have a small upper bound on the number > of nonzeros in a row, say 9 or 27? Then use that instead of N, not perfect > but much better than N. > > > > Barry > > > > > > > > > > as the indexes can be scatted anywhere between 1 to N but most are > ?inactive rows?. Is -pc_type *redistribute* the way to go for me to save > on memory? Though I request A[N][N] allocation, and not all rows are active > - I wonder if I am wasting a huge amount of memory? > > > > Kind regards, > > Karthik. > > > > > > > > > > *From: *Barry Smith > *Date: *Monday, 6 February 2023 at 22:42 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > Sorry was not clear MatZero*. I just meant MatZeroRows() or > MatZeroRowsColumns() > > > > On Feb 6, 2023, at 4:45 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > No problem. I don?t completely follow. > > > > (Q1) I have used MATMPIAJI but not sure what is MatZero* (star) and what > it does? And its relevance to my problem. > > > > (Q2) Since I am creating a MATMPIAJI system? what would be the best way to > insert 0.0 in to ALL diagonals (both active and inactive rows) to begin > with? > > > > Yes, just have each MPI process loop over its rows and put zero on the > diagonal (actually, you could put a 1 if you want). Then have your code use > AMReX to > > put all its values in, I am assuming the code uses INSERT_VALUES so it > will always overwrite the value you put in initially (hence putting in 1 > initially will be fine; the advantage of 1 is if you do not > use PCREDISTIBUTE the matrix is fully defined and so any solver will work. > If you know the inactive rows you can just put the diagonal on those since > AMReX will fill up the rest of the rows, but it is harmless to put values > on all diagonal entries. Do NOT call MatAssemblyBegin/End between filling > the diagonal entries and having AMReX put in its values. > > > > (Q3) If I have to insert 0.0 only into diagonals of ?inactive? rows after > I have put values into the matrix would be an effort. Unless there is a > straight forward to do it in PETSc. > > > > (Q4) For my problem do I need to use PCREDISTIBUTE or any linear solve > would eliminate those rows? > > > > Well no solver will really make sense if you have "inactive" rows, that > is rows with nothing in them except PCREDISTIBUTE. > > > > When PETSc was written we didn't understand having lots of completely > empty rows was a use case so much of the functionality does not work in > that case. > > > > > > > > Best, > > Karthik. > > > > *From: *Barry Smith > *Date: *Monday, 6 February 2023 at 20:18 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > Sorry, I had a mistake in my thinking, PCREDISTRIBUTE supports > completely empty rows but MatZero* does not. > > > > When you put values into the matrix you will need to insert a 0.0 on the > diagonal of each "inactive" row; all of this will be eliminated during the > linear solve process. It would be a major project to change the MatZero* > functions to handle empty rows. > > > > Barry > > > > > > > > > On Feb 4, 2023, at 12:06 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you very much for offering to debug. > > > > I built PETSc along with AMReX, so I tried to extract the PETSc code alone > which would reproduce the same error on the smallest sized problem possible. > > > > I have attached three files: > > > > petsc_amrex_error_redistribute.txt ? The error message from amrex/petsc > interface, but THE linear system solves and converges to a solution. > > > > problem.c ? A simple stand-alone petsc code, which produces almost the > same error message. > > > > petsc_ error_redistribute.txt ? The error message from problem.c but > strangely it does NOT solve ? I am not sure why? > > > > Please use problem.c to debug the issue. > > > > Kind regards, > > Karthik. > > > > > > *From: *Barry Smith > *Date: *Saturday, 4 February 2023 at 00:22 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > If you can help me reproduce the problem with a simple code I can debug > the problem and fix it. > > > > Barry > > > > > > > > > > On Feb 3, 2023, at 6:42 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > I updated the main branch to the below commit but the same problem > persists. > > > > *[0]PETSC ERROR: Petsc Development GIT revision: > v3.18.4-529-g995ec06f92 GIT Date: 2023-02-03 18:41:48 +0000* > > > > > > *From: *Barry Smith > *Date: *Friday, 3 February 2023 at 18:51 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > If you switch to use the main branch of petsc > https://petsc.org/release/install/download/#advanced-obtain-petsc-development-version-with-git you > will not have the problem below (previously we required that a row exist > before we zeroed it but now we allow the row to initially have no entries > and still be zeroed. > > > > Barry > > > > > > On Feb 3, 2023, at 1:04 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you. The entire error output was an attachment in my previous email. > I am pasting here for your reference. > > > > > > > > [1;31m[0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > > [0]PETSC ERROR: Option left: name:-options_left (no value) > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.1-127-ga207d08eda > GIT Date: 2022-10-30 11:03:25 -0500 > > [0]PETSC ERROR: > /Users/karthikeyan.chockalingam/AMReX/amrFEM/build/Debug/amrFEM on a named > HC20210312 by karthikeyan.chockalingam Fri Feb 3 11:10:01 2023 > > [0]PETSC ERROR: Configure options --with-debugging=0 > --prefix=/Users/karthikeyan.chockalingam/AMReX/petsc > --download-fblaslapack=yes --download-scalapack=yes --download-mumps=yes > --with-hypre-dir=/Users/karthikeyan.chockalingam/AMReX/hypre/src/hypre > > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/seq/aij.c:2218 > > [0]PETSC ERROR: #2 MatZeroRowsColumns() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/impls/aij/mpi/mpiaij.c:879 > > [0]PETSC ERROR: #4 MatZeroRowsColumns() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6085 > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > [0]PETSC ERROR: #6 localAssembly() at > /Users/karthikeyan.chockalingam/AMReX/amrFEM/src/FENodalPoisson.cpp:435 > > Residual norms for redistribute_ solve. > > 0 KSP preconditioned resid norm 5.182603110407e+00 true resid norm > 1.382027496109e+01 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 1.862430383976e+00 true resid norm > 4.966481023937e+00 ||r(i)||/||b|| 3.593619546588e-01 > > 2 KSP preconditioned resid norm 2.132803507689e-01 true resid norm > 5.687476020503e-01 ||r(i)||/||b|| 4.115313216645e-02 > > 3 KSP preconditioned resid norm 5.499797533437e-02 true resid norm > 1.466612675583e-01 ||r(i)||/||b|| 1.061203687852e-02 > > 4 KSP preconditioned resid norm 2.829814271435e-02 true resid norm > 7.546171390493e-02 ||r(i)||/||b|| 5.460217985345e-03 > > 5 KSP preconditioned resid norm 7.431048995318e-03 true resid norm > 1.981613065418e-02 ||r(i)||/||b|| 1.433844891652e-03 > > 6 KSP preconditioned resid norm 3.182040728972e-03 true resid norm > 8.485441943932e-03 ||r(i)||/||b|| 6.139850305312e-04 > > 7 KSP preconditioned resid norm 1.030867020459e-03 true resid norm > 2.748978721225e-03 ||r(i)||/||b|| 1.989091193167e-04 > > 8 KSP preconditioned resid norm 4.469429300003e-04 true resid norm > 1.191847813335e-03 ||r(i)||/||b|| 8.623908111021e-05 > > 9 KSP preconditioned resid norm 1.237303313796e-04 true resid norm > 3.299475503456e-04 ||r(i)||/||b|| 2.387416685085e-05 > > 10 KSP preconditioned resid norm 5.822094326756e-05 true resid norm > 1.552558487134e-04 ||r(i)||/||b|| 1.123391894522e-05 > > 11 KSP preconditioned resid norm 1.735776150969e-05 true resid norm > 4.628736402585e-05 ||r(i)||/||b|| 3.349236115503e-06 > > Linear redistribute_ solve converged due to CONVERGED_RTOL iterations 11 > > KSP Object: (redistribute_) 1 MPI process > > type: cg > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (redistribute_) 1 MPI process > > type: jacobi > > type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > type: mpiaij > > rows=48896, cols=48896 > > total: nonzeros=307976, allocated nonzeros=307976 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node (on process 0) routines > > End of program > > solve time 0.564714744 seconds > > Starting max value is: 0 > > Min value of level 0 is: 0 > > Interpolated min value is: 741.978761 > > Unused ParmParse Variables: > > [TOP]::model.type(nvals = 1) :: [3] > > [TOP]::ref_ratio(nvals = 1) :: [2] > > > > AMReX (22.10-20-g3082028e4287) finalized > > #PETSc Option Table entries: > > -ksp_type preonly > > -options_left > > -pc_type redistribute > > -redistribute_ksp_converged_reason > > -redistribute_ksp_monitor_true_residual > > -redistribute_ksp_type cg > > -redistribute_ksp_view > > -redistribute_pc_type jacobi > > #End of PETSc Option Table entries > > There are no unused options. > > Program ended with exit code: 0 > > > > > > Best, > > Karthik. > > > > *From: *Barry Smith > *Date: *Friday, 3 February 2023 at 17:41 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > We need all the error output for the errors you got below to understand > why the errors are happening. > > > > > > > > > > On Feb 3, 2023, at 11:41 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Hello Barry, > > > > I would like to better understand pc_type redistribute usage. > > > > I am plan to use pc_type *redistribute* in the context of adaptive mesh > refinement on a structured grid in 2D. My base mesh (level 0) is indexed > from 0 to N-1 elements and refined mesh (level 1) is indexed from 0 to > 4(N-1) elements. When I construct system matrix A on (level 1); I probably > only use 20% of 4(N-1) elements, however the indexes are scattered in the > range of 0 to 4(N-1). That leaves 80% of the rows and columns of the system > matrix A on (level 1) to be zero. From your earlier response, I believe > this would be a use case for petsc_type redistribute. > > > > Indeed the linear solve will be more efficient if you use the > redistribute solver. > > > > But I don't understand your plan. With adaptive refinement I would just > create the two matrices, one for the initial grid on which you solve the > system, this will be a smaller matrix and then create a new larger matrix > for the refined grid (and discard the previous matrix). > > > > > > > > > > > > Question (1) > > > > > > If N is really large, I would have to allocate memory of size 4(N-1) for > the system matrix A on (level 1). How does pc_type redistribute help? > Because, I did end up allocating memory for a large system, where most of > the rows and columns are zeros. Is most of the allotted memory not wasted? > *Is this the correct usage?* > > > > See above > > > > > > > > > > > > Question (2) > > > > > > I tried using pc_type redistribute for a two level system. > > I have *attached* the output only for (level 1) > > The solution converges to right solution but still petsc outputs some > error messages. > > > > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > > [0]PETSC ERROR: Option left: name:-options_left (no value) > > > > But the there were no unused options > > > > #PETSc Option Table entries: > > -ksp_type preonly > > -options_left > > -pc_type redistribute > > -redistribute_ksp_converged_reason > > -redistribute_ksp_monitor_true_residual > > -redistribute_ksp_type cg > > -redistribute_ksp_view > > -redistribute_pc_type jacobi > > #End of PETSc Option Table entries > > *There are no unused options.* > > Program ended with exit code: 0 > > > > I cannot explain this > > > > > > > > > Question (2) > > > > [0;39m[0;49m[0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Matrix is missing diagonal entry in row 0 (65792) > > > > What does this error message imply? Given I only use 20% of 4(N-1) > indexes, I can imagine most of the diagonal entrees are zero. *Is my > understanding correct?* > > > > > > Question (3) > > > > > > > > > [0]PETSC ERROR: #5 MatZeroRowsColumnsIS() at > /Users/karthikeyan.chockalingam/AMReX/SRC_PKG/petsc/src/mat/interface/matrix.c:6124 > > > > I am using MatZeroRowsColumnsIS to set the homogenous Dirichelet boundary. > I don?t follow why I get this error message as the linear system converges > to the right solution. > > > > Thank you for your help. > > > > Kind regards, > > Karthik. > > > > > > > > *From: *Barry Smith > *Date: *Tuesday, 10 January 2023 at 18:50 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > Yes, after the solve the x will contain correct values for ALL the > locations including the (zeroed out rows). You use case is exactly what > redistribute it for. > > > > Barry > > > > > > > > > > > > > On Jan 10, 2023, at 11:25 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thank you Barry. This is great! > > > > I plan to solve using ?-pc_type redistribute? after applying the Dirichlet > bc using > > MatZeroRowsColumnsIS(A, isout, 1, x, b); > > > > While I retrieve the solution data from x (after the solve) ? can I index > them using the original ordering (if I may say that)? > > > > Kind regards, > > Karthik. > > > > *From: *Barry Smith > *Date: *Tuesday, 10 January 2023 at 16:04 > *To: *Chockalingam, Karthikeyan (STFC,DL,HC) < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *petsc-users at mcs.anl.gov > *Subject: *Re: [petsc-users] Eliminating rows and columns which are zeros > > > > > https://petsc.org/release/docs/manualpages/PC/PCREDISTRIBUTE/#pcredistribute > -pc_type redistribute > > > > > > It does everything for you. Note that if the right hand side for any of > the "zero" rows is nonzero then the system is inconsistent and the system > does not have a solution. > > > > Barry > > > > > > > > > > > > > > On Jan 10, 2023, at 10:30 AM, Karthikeyan Chockalingam - STFC UKRI via > petsc-users wrote: > > > > Hello, > > > > I am assembling a MATIJ of size N, where a very large number of rows (and > corresponding columns), are zeros. I would like to potentially eliminate > them before the solve. > > > > For instance say N=7 > > > > 0 0 0 0 0 0 0 > > 0 1 -1 0 0 0 0 > > 0 -1 2 0 0 0 -1 > > 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 > > 0 0 -1 0 0 0 1 > > > > I would like to reduce it to a 3x3 > > > > 1 -1 0 > > -1 2 -1 > > 0 -1 1 > > > > I do know the size N. > > > > Q1) How do I do it? > > Q2) Is it better to eliminate them as it would save a lot of memory? > > Q3) At the moment, I don?t know which rows (and columns) have the zero > entries but with some effort I probably can find them. Should I know which > rows (and columns) I am eliminating? > > > > Thank you. > > > > Karthik. > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not use, > disclose, copy or distribute this email or any of its attachments and > should notify the sender immediately and delete this email from your > system. UK Research and Innovation (UKRI) has taken every reasonable > precaution to minimise risk of this email or any attachments containing > viruses or malware but the recipient should carry out its own virus and > malware checks before opening the attachments. UKRI does not accept any > liability for any losses or damages which the recipient may sustain due to > presence of any viruses. > > > > > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From margherita.guido at epfl.ch Mon Feb 13 03:47:43 2023 From: margherita.guido at epfl.ch (Guido Margherita) Date: Mon, 13 Feb 2023 09:47:43 +0000 Subject: [petsc-users] MatMatMul inefficient Message-ID: <5853254D-731E-459C-9E0E-6488DD1607DA@epfl.ch> Hi all, I realised that performing a matrix-matrix multiplication using the function MatMatMult it is not at all computationally efficient with respect to performing N times a matrix-vector multiplication with MatMul, being N the number of columns of the second matrix in the product. When I multiply I matrix A 46816 x 46816 to a matrix Q 46816 x 6, the MatMatMul function is indeed 6 times more expensive than 6 times a call to MatMul, when performed sequentially (0.04056 s vs 0.0062 s ). When the same code is run in parallel the gap grows even more, being10 times more expensive. Is there an explanation for it? t1 = MPI_Wtime() call MatMatMult(A,Q,MAT_INITIAL_MATRIX, PETSC_DEFAULT_REAL, AQ, ierr ) t2 = MPI_Wtime() t_MatMatMul = t2-t1 t_MatMul=0.0 do j = 0, m-1 call MatGetColumnVector(Q, q_vec, j,ierr) t1 = MPI_Wtime() call MatMult(A, q_vec, aq_vec, ierr) t2 = MPI_Wtime() t_MatMul = t_MatMul + t2-t1 end do Thank you, Margherita Guido From knepley at gmail.com Mon Feb 13 08:27:56 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 13 Feb 2023 09:27:56 -0500 Subject: [petsc-users] MatMatMul inefficient In-Reply-To: <5853254D-731E-459C-9E0E-6488DD1607DA@epfl.ch> References: <5853254D-731E-459C-9E0E-6488DD1607DA@epfl.ch> Message-ID: On Mon, Feb 13, 2023 at 9:21 AM Guido Margherita via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi all, > > I realised that performing a matrix-matrix multiplication using the > function MatMatMult it is not at all computationally efficient with respect > to performing N times a matrix-vector multiplication with MatMul, being N > the number of columns of the second matrix in the product. > When I multiply I matrix A 46816 x 46816 to a matrix Q 46816 x 6, the > MatMatMul function is indeed 6 times more expensive than 6 times a call to > MatMul, when performed sequentially (0.04056 s vs 0.0062 s ). When the > same code is run in parallel the gap grows even more, being10 times more > expensive. > Is there an explanation for it? > So we can reproduce this, what kind of matrix is A? I am assuming that Q is dense. Thanks, Matt > > t1 = MPI_Wtime() > call MatMatMult(A,Q,MAT_INITIAL_MATRIX, PETSC_DEFAULT_REAL, AQ, ierr ) > t2 = MPI_Wtime() > t_MatMatMul = t2-t1 > > t_MatMul=0.0 > do j = 0, m-1 > call MatGetColumnVector(Q, q_vec, j,ierr) > > t1 = MPI_Wtime() > call MatMult(A, q_vec, aq_vec, ierr) > t2 = MPI_Wtime() > > t_MatMul = t_MatMul + t2-t1 > end do > > Thank you, > Margherita Guido > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From margherita.guido at epfl.ch Mon Feb 13 08:38:40 2023 From: margherita.guido at epfl.ch (Guido Margherita) Date: Mon, 13 Feb 2023 14:38:40 +0000 Subject: [petsc-users] MatMatMul inefficient In-Reply-To: References: <5853254D-731E-459C-9E0E-6488DD1607DA@epfl.ch> Message-ID: <94FC9342-6963-45CB-AC76-63253C9F62F4@epfl.ch> A is a sparse MATSEQAIJ, Q is dense. Thanks, Margherita > Il giorno 13 feb 2023, alle ore 3:27 PM, knepley at gmail.com ha scritto: > > On Mon, Feb 13, 2023 at 9:21 AM Guido Margherita via petsc-users wrote: > Hi all, > > I realised that performing a matrix-matrix multiplication using the function MatMatMult it is not at all computationally efficient with respect to performing N times a matrix-vector multiplication with MatMul, being N the number of columns of the second matrix in the product. > When I multiply I matrix A 46816 x 46816 to a matrix Q 46816 x 6, the MatMatMul function is indeed 6 times more expensive than 6 times a call to MatMul, when performed sequentially (0.04056 s vs 0.0062 s ). When the same code is run in parallel the gap grows even more, being10 times more expensive. > Is there an explanation for it? > > So we can reproduce this, what kind of matrix is A? I am assuming that Q is dense. > > Thanks, > > Matt > > > t1 = MPI_Wtime() > call MatMatMult(A,Q,MAT_INITIAL_MATRIX, PETSC_DEFAULT_REAL, AQ, ierr ) > t2 = MPI_Wtime() > t_MatMatMul = t2-t1 > > t_MatMul=0.0 > do j = 0, m-1 > call MatGetColumnVector(Q, q_vec, j,ierr) > > t1 = MPI_Wtime() > call MatMult(A, q_vec, aq_vec, ierr) > t2 = MPI_Wtime() > > t_MatMul = t_MatMul + t2-t1 > end do > > Thank you, > Margherita Guido > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From pierre at joliv.et Mon Feb 13 08:51:41 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Mon, 13 Feb 2023 15:51:41 +0100 Subject: [petsc-users] MatMatMul inefficient In-Reply-To: <94FC9342-6963-45CB-AC76-63253C9F62F4@epfl.ch> References: <94FC9342-6963-45CB-AC76-63253C9F62F4@epfl.ch> Message-ID: <05E5037D-4AC5-4309-A76C-39180DEAF5E1@joliv.et> Could you please share a reproducer? What you are seeing is not typical of the performance of such a kernel, both from a theoretical or a practical (see fig. 2 of https://joliv.et/article.pdf) point of view. Thanks, Pierre > On 13 Feb 2023, at 3:38 PM, Guido Margherita via petsc-users wrote: > ?A is a sparse MATSEQAIJ, Q is dense. > > Thanks, > Margherita > >> Il giorno 13 feb 2023, alle ore 3:27 PM, knepley at gmail.com ha scritto: >> >> On Mon, Feb 13, 2023 at 9:21 AM Guido Margherita via petsc-users wrote: >> Hi all, >> >> I realised that performing a matrix-matrix multiplication using the function MatMatMult it is not at all computationally efficient with respect to performing N times a matrix-vector multiplication with MatMul, being N the number of columns of the second matrix in the product. >> When I multiply I matrix A 46816 x 46816 to a matrix Q 46816 x 6, the MatMatMul function is indeed 6 times more expensive than 6 times a call to MatMul, when performed sequentially (0.04056 s vs 0.0062 s ). When the same code is run in parallel the gap grows even more, being10 times more expensive. >> Is there an explanation for it? >> >> So we can reproduce this, what kind of matrix is A? I am assuming that Q is dense. >> >> Thanks, >> >> Matt >> >> >> t1 = MPI_Wtime() >> call MatMatMult(A,Q,MAT_INITIAL_MATRIX, PETSC_DEFAULT_REAL, AQ, ierr ) >> t2 = MPI_Wtime() >> t_MatMatMul = t2-t1 >> >> t_MatMul=0.0 >> do j = 0, m-1 >> call MatGetColumnVector(Q, q_vec, j,ierr) >> >> t1 = MPI_Wtime() >> call MatMult(A, q_vec, aq_vec, ierr) >> t2 = MPI_Wtime() >> >> t_MatMul = t_MatMul + t2-t1 >> end do >> >> Thank you, >> Margherita Guido >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.bernigaud at onera.fr Tue Feb 14 03:33:54 2023 From: pierre.bernigaud at onera.fr (Pierre Bernigaud) Date: Tue, 14 Feb 2023 10:33:54 +0100 Subject: [petsc-users] MatFDColoringSetUp with Periodic BC Message-ID: <919234c3017db5d21098774229cbb583@onera.fr> Dear all, I hope this email finds you well. We are currently working on a solver which is employing DMDA with SNES. The jacobian is computed via FDColoring, ie: call DMDACreate1D(PETSC_COMM_WORLD, DM_BOUNDARY_GHOSTED, NY, NC, NGc, PETSC_NULL_INTEGER, dmF, ierr) ! ----- Some steps ... call DMCreateColoring(dmF, IS_COLORING_GLOBAL, iscoloring, ierr) call MatFDColoringCreate(Jac,iscoloring, matfdcoloring, ierr) call MatFDColoringSetFunction(matfdcoloring, FormFunction, CTX, ierr) call MatFDColoringSetUp(Jac ,iscoloring,matfdcoloring, ierr) call SNESSetJacobian(snes, Jac, Jac, SNESComputeJacobianDefaultColor, matfdcoloring, ierr) Everything is running smoothly. Recently, we modified the boundary conditions such as to use periodic BC: call DMDACreate1D(PETSC_COMM_WORLD, DM_BOUNDARY_PERIODIC, NY, NC, NGc, PETSC_NULL_INTEGER, dmF, ierr) We then encountered frequent crashes when calling MatFDColoringSetUp, depending on the number of cells NY. After looking for an solution, I found this old thread: https://lists.mcs.anl.gov/pipermail/petsc-users/2013-May/017449.html It appears that when using periodic BC, FDColoring can only be used if the number of cells is divisible by 2*NGc+1. Even though this is only a slight annoyance, I was wondering if you were working on this matter / if you had a quick fix at hand? At any rate, I think it would be nice if a warning was displayed in the FDColoring documentation? Respectfully, Pierre Bernigaud -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbarnafi at cmm.uchile.cl Tue Feb 14 07:36:34 2023 From: nbarnafi at cmm.uchile.cl (Nicolas Barnafi) Date: Tue, 14 Feb 2023 14:36:34 +0100 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: Message-ID: <15fd4b93-6e77-b9ef-95f6-1f3c1ed45162@cmm.uchile.cl> Hello Matt, After some discussions elsewhere (thanks @LMitchell!), we found out that the problem is that the fields are setup with PCSetIS, without an attached DM, which does not support this kind of nesting fields. I would like to add this feature, meaning that during the setup of the preconditioner there should be an additional routine when there is no dm that reads _%i_fields and sets the corresponding fields to the sub PCs, in some order. My plan would be to do so at the PCSetUp_FieldSplit level. The idea is that whenever some IS fields are set such as 'a' and 'b', it should be possible to rearrange them with '-pc_fieldsplit_0_fields a,b' , or at least support this with numbered fields. How do you see it? Best, NB On 06/02/23 17:57, Matthew Knepley wrote: > On Mon, Feb 6, 2023 at 11:45 AM Nicolas Barnafi > wrote: > > Thank you Matt, > > Again, at the bottom of this message you will find the -info > output. I don't see any output related to the fields, > > > If the splits?were done automatically, you would see an info message > from here: > > https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/pc/impls/fieldsplit/fieldsplit.c#L1595 > > Thus it must be setup here > > https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/pc/impls/fieldsplit/fieldsplit.c#L380 > > There are info statements, but you do not see them, I do not see a way > around using a small example > to understand how you are setting up the system, since it is working > as expected in the PETSc examples. > > ? Thanks, > > ? ? ? Matt > > Best > > > ------ -info > > [0] PetscDetermineInitialFPTrap(): Floating point trapping > is on by default 13 > [0] PetscDeviceInitializeTypeFromOptions_Private(): > PetscDeviceType host available, initializing > [0] PetscDeviceInitializeTypeFromOptions_Private(): > PetscDevice host initialized, default device id 0, view FALSE, > init type lazy > [0] PetscDeviceInitializeTypeFromOptions_Private(): > PetscDeviceType cuda not available > [0] PetscDeviceInitializeTypeFromOptions_Private(): > PetscDeviceType hip not available > [0] PetscDeviceInitializeTypeFromOptions_Private(): > PetscDeviceType sycl not available > [0] PetscInitialize_Common(): PETSc successfully started: > number of processors = 1 > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > nico-santech.(none) > [0] PetscInitialize_Common(): Running on machine: nico-santech > [0] SlepcInitialize(): SLEPc successfully started > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066936960 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066936960 is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066936960 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066936960 is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066936960 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066936960 is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066936960 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066936960 is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066936960 94770087780768 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770087780768 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066936960 is being unlinked from inner PETSc comm 94770087780768 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770087780768 > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066936960 94770087780768 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1219 X 1219; > storage space: 0 unneeded,26443 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 150 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 1219) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 1160 nodes out of 1219 > rows. Not using Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066936960 94770087780768 > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > nico-santech.(none) > [0] PCSetUp(): Setting up PC for first time > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 615 X 615; storage > space: 0 unneeded,9213 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 117 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 615) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 561 nodes out of 615 rows. > Not using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066934048 94770110251424 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770110251424 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066934048 is being unlinked from inner PETSc comm 94770110251424 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770110251424 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770110251424 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 64 X 64; storage > space: 0 unneeded,0 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 64)/(num_localrows 64) > 0.6. Use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 13 nodes of 64. Limit used: > 5. Using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066934048 94770100861088 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770100861088 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066934048 is being unlinked from inner PETSc comm 94770100861088 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770100861088 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 240 X 240; storage > space: 0 unneeded,2140 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 11 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 240) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 235 nodes out of 240 rows. > Not using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066934048 94770100861088 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770100861088 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066934048 is being unlinked from inner PETSc comm 94770100861088 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770100861088 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 300 X 300; storage > space: 0 unneeded,2292 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 33 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 300) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 300 nodes out of 300 rows. > Not using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066934048 94770100861088 max tags = 2147483647 > [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to > PETSc communicator embedded in a user MPI_Comm 94770100861088 > [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm > 94770066934048 is being unlinked from inner PETSc comm 94770100861088 > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 > [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in > an MPI_Comm 94770100861088 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 615 X 1219; > storage space: 0 unneeded,11202 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 150 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 615) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 561 nodes out of 615 rows. > Not using Inode routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 64 X 1219; storage > space: 0 unneeded,288 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 64) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 64 nodes out of 64 rows. > Not using Inode routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 240 X 1219; > storage space: 0 unneeded,8800 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 78 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 240) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 235 nodes out of 240 rows. > Not using Inode routines > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 300 X 1219; > storage space: 0 unneeded,6153 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 89 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 300) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 300 nodes out of 300 rows. > Not using Inode routines > [0] PetscCommDuplicate(): Duplicating a communicator > 94770066934048 94770100861088 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > ? 0 KSP Residual norm 2.541418258630e+01 > [0] KSPConvergedDefault(): user has provided nonzero initial > guess, computing 2-norm of RHS > [0] PCSetUp(): Leaving PC with identical preconditioner since > operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since > operator is unchanged > [0] PCSetUp(): Setting up PC for first time > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PCSetUp(): Leaving PC with identical preconditioner since > operator is unchanged > [0] PCSetUp(): Setting up PC for first time > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 94770100861088 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 94770066934048 947701008610882 > > > On 03/02/23 21:11, Matthew Knepley wrote: >> On Fri, Feb 3, 2023 at 3:03 PM Nicolas Barnafi >> wrote: >> >> > There are a number of common errors: >> > >> >? ? 1) Your PC has a prefix >> > >> >? ? 2) You have not called KSPSetFromOptions() here >> > >> > Can you send the -ksp_view output? >> >> The PC at least has no prefix. I had to set ksp_rtol to 1 to >> get through >> the solution process, you will find both the petsc_rc and the >> ksp_view >> at the bottom of this message. >> >> Options are indeed being set from the options file, but there >> must be >> something missing at a certain level. Thanks for looking into >> this. >> >> >> Okay, the next step is to pass >> >> ? -info >> >> and send the output. This will tell us how the default splits >> were done. If that >> is not conclusive, we will have?to use the debugger. >> >> ? Thanks, >> >> ? ? ?Matt >> >> Best >> >> ---- petsc_rc file >> >> -ksp_monitor >> -ksp_type gmres >> -ksp_view >> -mat_type aij >> -ksp_norm_type unpreconditioned >> -ksp_atol 1e-14 >> -ksp_rtol 1 >> -pc_type fieldsplit >> -pc_fieldsplit_type multiplicative >> >> ---- ksp_view >> >> KSP Object: 1 MPI process >> ? ?type: gmres >> ? ? ?restart=500, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> ? ? ?happy breakdown tolerance 1e-30 >> ? ?maximum iterations=10000, nonzero initial guess >> ? ?tolerances:? relative=1., absolute=1e-14, divergence=10000. >> ? ?right preconditioning >> ? ?using UNPRECONDITIONED norm type for convergence test >> PC Object: 1 MPI process >> ? ?type: fieldsplit >> ? ? ?FieldSplit with MULTIPLICATIVE composition: total splits = 4 >> ? ? ?Solver info for each split is in the following KSP objects: >> ? ?Split number 0 Defined by IS >> ? ?KSP Object: (fieldsplit_0_) 1 MPI process >> ? ? ?type: preonly >> ? ? ?maximum iterations=10000, initial guess is zero >> ? ? ?tolerances:? relative=1e-05, absolute=1e-50, >> divergence=10000. >> ? ? ?left preconditioning >> ? ? ?using DEFAULT norm type for convergence test >> ? ?PC Object: (fieldsplit_0_) 1 MPI process >> ? ? ?type: ilu >> ? ? ?PC has not been set up so information may be incomplete >> ? ? ? ?out-of-place factorization >> ? ? ? ?0 levels of fill >> ? ? ? ?tolerance for zero pivot 2.22045e-14 >> ? ? ? ?matrix ordering: natural >> ? ? ? ?matrix solver type: petsc >> ? ? ? ?matrix not yet factored; no additional information >> available >> ? ? ?linear system matrix = precond matrix: >> ? ? ?Mat Object: (fieldsplit_0_) 1 MPI process >> ? ? ? ?type: seqaij >> ? ? ? ?rows=615, cols=615 >> ? ? ? ?total: nonzeros=9213, allocated nonzeros=9213 >> ? ? ? ?total number of mallocs used during MatSetValues calls=0 >> ? ? ? ? ?not using I-node routines >> ? ?Split number 1 Defined by IS >> ? ?KSP Object: (fieldsplit_1_) 1 MPI process >> ? ? ?type: preonly >> ? ? ?maximum iterations=10000, initial guess is zero >> ? ? ?tolerances:? relative=1e-05, absolute=1e-50, >> divergence=10000. >> ? ? ?left preconditioning >> ? ? ?using DEFAULT norm type for convergence test >> ? ?PC Object: (fieldsplit_1_) 1 MPI process >> ? ? ?type: ilu >> ? ? ?PC has not been set up so information may be incomplete >> ? ? ? ?out-of-place factorization >> ? ? ? ?0 levels of fill >> ? ? ? ?tolerance for zero pivot 2.22045e-14 >> ? ? ? ?matrix ordering: natural >> ? ? ? ?matrix solver type: petsc >> ? ? ? ?matrix not yet factored; no additional information >> available >> ? ? ?linear system matrix = precond matrix: >> ? ? ?Mat Object: (fieldsplit_1_) 1 MPI process >> ? ? ? ?type: seqaij >> ? ? ? ?rows=64, cols=64 >> ? ? ? ?total: nonzeros=0, allocated nonzeros=0 >> ? ? ? ?total number of mallocs used during MatSetValues calls=0 >> ? ? ? ? ?using I-node routines: found 13 nodes, limit used is 5 >> ? ?Split number 2 Defined by IS >> ? ?KSP Object: (fieldsplit_2_) 1 MPI process >> ? ? ?type: preonly >> ? ? ?maximum iterations=10000, initial guess is zero >> ? ? ?tolerances:? relative=1e-05, absolute=1e-50, >> divergence=10000. >> ? ? ?left preconditioning >> ? ? ?using DEFAULT norm type for convergence test >> ? ?PC Object: (fieldsplit_2_) 1 MPI process >> ? ? ?type: ilu >> ? ? ?PC has not been set up so information may be incomplete >> ? ? ? ?out-of-place factorization >> ? ? ? ?0 levels of fill >> ? ? ? ?tolerance for zero pivot 2.22045e-14 >> ? ? ? ?matrix ordering: natural >> ? ? ? ?matrix solver type: petsc >> ? ? ? ?matrix not yet factored; no additional information >> available >> ? ? ?linear system matrix = precond matrix: >> ? ? ?Mat Object: (fieldsplit_2_) 1 MPI process >> ? ? ? ?type: seqaij >> ? ? ? ?rows=240, cols=240 >> ? ? ? ?total: nonzeros=2140, allocated nonzeros=2140 >> ? ? ? ?total number of mallocs used during MatSetValues calls=0 >> ? ? ? ? ?not using I-node routines >> ? ?Split number 3 Defined by IS >> ? ?KSP Object: (fieldsplit_3_) 1 MPI process >> ? ? ?type: preonly >> ? ? ?maximum iterations=10000, initial guess is zero >> ? ? ?tolerances:? relative=1e-05, absolute=1e-50, >> divergence=10000. >> ? ? ?left preconditioning >> ? ? ?using DEFAULT norm type for convergence test >> ? ?PC Object: (fieldsplit_3_) 1 MPI process >> ? ? ?type: ilu >> ? ? ?PC has not been set up so information may be incomplete >> ? ? ? ?out-of-place factorization >> ? ? ? ?0 levels of fill >> ? ? ? ?tolerance for zero pivot 2.22045e-14 >> ? ? ? ?matrix ordering: natural >> ? ? ? ?matrix solver type: petsc >> ? ? ? ?matrix not yet factored; no additional information >> available >> ? ? ?linear system matrix = precond matrix: >> ? ? ?Mat Object: (fieldsplit_3_) 1 MPI process >> ? ? ? ?type: seqaij >> ? ? ? ?rows=300, cols=300 >> ? ? ? ?total: nonzeros=2292, allocated nonzeros=2292 >> ? ? ? ?total number of mallocs used during MatSetValues calls=0 >> ? ? ? ? ?not using I-node routines >> ? ?linear system matrix = precond matrix: >> ? ?Mat Object: 1 MPI process >> ? ? ?type: seqaij >> ? ? ?rows=1219, cols=1219 >> ? ? ?total: nonzeros=26443, allocated nonzeros=26443 >> ? ? ?total number of mallocs used during MatSetValues calls=0 >> ? ? ? ?not using I-node routines >> ? ? ? ? ? ? ? solving time: 0.00449609 >> ? ? ? ? ? ? ? ? iterations: 0 >> ? ? ? ? ? ?estimated error: 25.4142 >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.salazar at corintis.com Tue Feb 14 08:12:16 2023 From: miguel.salazar at corintis.com (Miguel Angel Salazar de Troya) Date: Tue, 14 Feb 2023 15:12:16 +0100 Subject: [petsc-users] Segregated solvers in PETSc Message-ID: Hello, I am solving the Navier-Stokes equation and an advection-diffusion equation to model the temperature. They are fully coupled because the viscosity is temperature dependent. I plan to solve the fully-coupled problem with a segregated approach: I first solve the Navier-Stokes equation for a fixed temperature and feed the velocity to the thermal equation, then use that temperature back in the Navier-Stokes equation to solve for the velocity again until I reach convergence. If I assemble the residual and jacobian for the fully coupled system with the proper fields section for the fieldsplit preconditioner (I am using Firedrake), is there a way to tell PETSc to solve the problem with a segregated approach? Thanks, Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 14 08:51:26 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 14 Feb 2023 09:51:26 -0500 Subject: [petsc-users] Segregated solvers in PETSc In-Reply-To: References: Message-ID: On Tue, Feb 14, 2023 at 9:25 AM Miguel Angel Salazar de Troya < miguel.salazar at corintis.com> wrote: > Hello, > > I am solving the Navier-Stokes equation and an advection-diffusion > equation to model the temperature. They are fully coupled because the > viscosity is temperature dependent. I plan to solve the fully-coupled > problem with a segregated approach: I first solve the Navier-Stokes > equation for a fixed temperature and feed the velocity to the thermal > equation, then use that temperature back in the Navier-Stokes equation to > solve for the velocity again until I reach convergence. If I assemble the > residual and jacobian for the fully coupled system with the proper fields > section for the fieldsplit preconditioner (I am using Firedrake), is there > a way to tell PETSc to solve the problem with a segregated approach? > For a linear problem, this is easy using PCFIELDSPLIT. Thus you could use this to solve the Newton system for your problem. Doing this for a nonlinear problem is still harder because the top-level PETSc interface does not have a way to assemble subsets of the nonlinear problem. If you use a DMPlex to express the problem _and_ its callbacks to express the physics, then we can split the nonlinear problem into pieces. I am working with the Firedrake folks to develop a general interface for this. Hopefully we finish this year. Right now, I think the easiest way to do the nonlinear thing is to write separate UFL for the two problems and control the loop yourself, or to just use the linear variation. Thanks, Matt > Thanks, > Miguel > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 14 08:55:24 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 14 Feb 2023 09:55:24 -0500 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: <15fd4b93-6e77-b9ef-95f6-1f3c1ed45162@cmm.uchile.cl> References: <15fd4b93-6e77-b9ef-95f6-1f3c1ed45162@cmm.uchile.cl> Message-ID: On Tue, Feb 14, 2023 at 8:36 AM Nicolas Barnafi wrote: > Hello Matt, > > After some discussions elsewhere (thanks @LMitchell!), we found out that > the problem is that the fields are setup with PCSetIS, without an attached > DM, which does not support this kind of nesting fields. > > I would like to add this feature, meaning that during the setup of the > preconditioner there should be an additional routine when there is no dm > that reads _%i_fields and sets the corresponding fields to the sub PCs, in > some order. > > My plan would be to do so at the PCSetUp_FieldSplit level. The idea is > that whenever some IS fields are set such as 'a' and 'b', it should be > possible to rearrange them with '-pc_fieldsplit_0_fields a,b' , or at least > support this with numbered fields. > > How do you see it? > Just to clarify, if you call SetIS() 3 times, and then give -pc_fieldsplit_0_fields 0,2 then we should reduce the number of fields to two by calling ISConcatenate() on the first and last ISes? I think this should not be hard. It will work exactly as it does on the DM case, except the ISes will come from the PC, not the DM. One complication is that you will have to hold the new ISes until the end, and then set them. Thanks, Matt > Best, > NB > > On 06/02/23 17:57, Matthew Knepley wrote: > > On Mon, Feb 6, 2023 at 11:45 AM Nicolas Barnafi > wrote: > >> Thank you Matt, >> >> Again, at the bottom of this message you will find the -info output. I >> don't see any output related to the fields, >> > > If the splits were done automatically, you would see an info message from > here: > > > https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/pc/impls/fieldsplit/fieldsplit.c#L1595 > > Thus it must be setup here > > > https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/pc/impls/fieldsplit/fieldsplit.c#L380 > > There are info statements, but you do not see them, I do not see a way > around using a small example > to understand how you are setting up the system, since it is working as > expected in the PETSc examples. > > Thanks, > > Matt > > >> Best >> >> >> ------ -info >> >> [0] PetscDetermineInitialFPTrap(): Floating point trapping is on by >> default 13 >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> host available, initializing >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice >> host initialized, default device id 0, view FALSE, init type lazy >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> cuda not available >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> hip not available >> [0] PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType >> sycl not available >> [0] PetscInitialize_Common(): PETSc successfully started: number of >> processors = 1 >> [0] PetscGetHostName(): Rejecting domainname, likely is NIS >> nico-santech.(none) >> [0] PetscInitialize_Common(): Running on machine: nico-santech >> [0] SlepcInitialize(): SLEPc successfully started >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 >> 94770087780768 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770087780768 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 >> is being unlinked from inner PETSc comm 94770087780768 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770087780768 >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 >> 94770087780768 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770087780768 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 >> is being unlinked from inner PETSc comm 94770087780768 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770087780768 >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 >> 94770087780768 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770087780768 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 >> is being unlinked from inner PETSc comm 94770087780768 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770087780768 >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 >> 94770087780768 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770087780768 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 >> is being unlinked from inner PETSc comm 94770087780768 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770087780768 >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 >> 94770087780768 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770087780768 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066936960 >> is being unlinked from inner PETSc comm 94770087780768 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770087780768 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770087780768 >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066936960 >> 94770087780768 max tags = 2147483647 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066936960 94770087780768 >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1219 X 1219; storage >> space: 0 unneeded,26443 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 150 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 1219) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 1160 nodes out of 1219 rows. Not >> using Inode routines >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066936960 94770087780768 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066936960 94770087780768 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066936960 94770087780768 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066936960 94770087780768 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066936960 94770087780768 >> [0] PetscGetHostName(): Rejecting domainname, likely is NIS >> nico-santech.(none) >> [0] PCSetUp(): Setting up PC for first time >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 615 X 615; storage space: >> 0 unneeded,9213 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 117 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 615) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 561 nodes out of 615 rows. Not >> using Inode routines >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 >> 94770110251424 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770110251424 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 >> is being unlinked from inner PETSc comm 94770110251424 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770110251424 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770110251424 >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 64 X 64; storage space: 0 >> unneeded,0 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 64)/(num_localrows 64) > 0.6. Use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 13 nodes of 64. Limit used: 5. >> Using Inode routines >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 >> 94770100861088 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770100861088 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 >> is being unlinked from inner PETSc comm 94770100861088 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770100861088 >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 240 X 240; storage space: >> 0 unneeded,2140 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 11 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 240) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 235 nodes out of 240 rows. Not >> using Inode routines >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 >> 94770100861088 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770100861088 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 >> is being unlinked from inner PETSc comm 94770100861088 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770100861088 >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 300 X 300; storage space: >> 0 unneeded,2292 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 33 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 300) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 300 nodes out of 300 rows. Not >> using Inode routines >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 >> 94770100861088 max tags = 2147483647 >> [0] Petsc_OuterComm_Attr_Delete_Fn(): Removing reference to PETSc >> communicator embedded in a user MPI_Comm 94770100861088 >> [0] Petsc_InnerComm_Attr_Delete_Fn(): User MPI_Comm 94770066934048 >> is being unlinked from inner PETSc comm 94770100861088 >> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 94770100861088 >> [0] Petsc_Counter_Attr_Delete_Fn(): Deleting counter data in an >> MPI_Comm 94770100861088 >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 615 X 1219; storage >> space: 0 unneeded,11202 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 150 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 615) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 561 nodes out of 615 rows. Not >> using Inode routines >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 64 X 1219; storage space: >> 0 unneeded,288 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 64) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 64 nodes out of 64 rows. Not using >> Inode routines >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 240 X 1219; storage >> space: 0 unneeded,8800 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 78 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 240) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 235 nodes out of 240 rows. Not >> using Inode routines >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 300 X 1219; storage >> space: 0 unneeded,6153 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 89 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 300) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 300 nodes out of 300 rows. Not >> using Inode routines >> [0] PetscCommDuplicate(): Duplicating a communicator 94770066934048 >> 94770100861088 max tags = 2147483647 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066934048 94770100861088 >> 0 KSP Residual norm 2.541418258630e+01 >> [0] KSPConvergedDefault(): user has provided nonzero initial guess, >> computing 2-norm of RHS >> [0] PCSetUp(): Leaving PC with identical preconditioner since >> operator is unchanged >> [0] PCSetUp(): Leaving PC with identical preconditioner since >> operator is unchanged >> [0] PCSetUp(): Setting up PC for first time >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066934048 94770100861088 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066934048 94770100861088 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066934048 94770100861088 >> [0] PCSetUp(): Leaving PC with identical preconditioner since >> operator is unchanged >> [0] PCSetUp(): Setting up PC for first time >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066934048 94770100861088 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066934048 94770100861088 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066934048 94770100861088 >> [0] PetscCommDuplicate(): Using internal PETSc communicator >> 94770066934048 947701008610882 >> >> >> On 03/02/23 21:11, Matthew Knepley wrote: >> >> On Fri, Feb 3, 2023 at 3:03 PM Nicolas Barnafi >> wrote: >> >>> > There are a number of common errors: >>> > >>> > 1) Your PC has a prefix >>> > >>> > 2) You have not called KSPSetFromOptions() here >>> > >>> > Can you send the -ksp_view output? >>> >>> The PC at least has no prefix. I had to set ksp_rtol to 1 to get through >>> the solution process, you will find both the petsc_rc and the ksp_view >>> at the bottom of this message. >>> >>> Options are indeed being set from the options file, but there must be >>> something missing at a certain level. Thanks for looking into this. >>> >> >> Okay, the next step is to pass >> >> -info >> >> and send the output. This will tell us how the default splits were done. >> If that >> is not conclusive, we will have to use the debugger. >> >> Thanks, >> >> Matt >> >> >>> Best >>> >>> ---- petsc_rc file >>> >>> -ksp_monitor >>> -ksp_type gmres >>> -ksp_view >>> -mat_type aij >>> -ksp_norm_type unpreconditioned >>> -ksp_atol 1e-14 >>> -ksp_rtol 1 >>> -pc_type fieldsplit >>> -pc_fieldsplit_type multiplicative >>> >>> ---- ksp_view >>> >>> KSP Object: 1 MPI process >>> type: gmres >>> restart=500, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement >>> happy breakdown tolerance 1e-30 >>> maximum iterations=10000, nonzero initial guess >>> tolerances: relative=1., absolute=1e-14, divergence=10000. >>> right preconditioning >>> using UNPRECONDITIONED norm type for convergence test >>> PC Object: 1 MPI process >>> type: fieldsplit >>> FieldSplit with MULTIPLICATIVE composition: total splits = 4 >>> Solver info for each split is in the following KSP objects: >>> Split number 0 Defined by IS >>> KSP Object: (fieldsplit_0_) 1 MPI process >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using DEFAULT norm type for convergence test >>> PC Object: (fieldsplit_0_) 1 MPI process >>> type: ilu >>> PC has not been set up so information may be incomplete >>> out-of-place factorization >>> 0 levels of fill >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> matrix solver type: petsc >>> matrix not yet factored; no additional information available >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_0_) 1 MPI process >>> type: seqaij >>> rows=615, cols=615 >>> total: nonzeros=9213, allocated nonzeros=9213 >>> total number of mallocs used during MatSetValues calls=0 >>> not using I-node routines >>> Split number 1 Defined by IS >>> KSP Object: (fieldsplit_1_) 1 MPI process >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using DEFAULT norm type for convergence test >>> PC Object: (fieldsplit_1_) 1 MPI process >>> type: ilu >>> PC has not been set up so information may be incomplete >>> out-of-place factorization >>> 0 levels of fill >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> matrix solver type: petsc >>> matrix not yet factored; no additional information available >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_1_) 1 MPI process >>> type: seqaij >>> rows=64, cols=64 >>> total: nonzeros=0, allocated nonzeros=0 >>> total number of mallocs used during MatSetValues calls=0 >>> using I-node routines: found 13 nodes, limit used is 5 >>> Split number 2 Defined by IS >>> KSP Object: (fieldsplit_2_) 1 MPI process >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using DEFAULT norm type for convergence test >>> PC Object: (fieldsplit_2_) 1 MPI process >>> type: ilu >>> PC has not been set up so information may be incomplete >>> out-of-place factorization >>> 0 levels of fill >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> matrix solver type: petsc >>> matrix not yet factored; no additional information available >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_2_) 1 MPI process >>> type: seqaij >>> rows=240, cols=240 >>> total: nonzeros=2140, allocated nonzeros=2140 >>> total number of mallocs used during MatSetValues calls=0 >>> not using I-node routines >>> Split number 3 Defined by IS >>> KSP Object: (fieldsplit_3_) 1 MPI process >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using DEFAULT norm type for convergence test >>> PC Object: (fieldsplit_3_) 1 MPI process >>> type: ilu >>> PC has not been set up so information may be incomplete >>> out-of-place factorization >>> 0 levels of fill >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> matrix solver type: petsc >>> matrix not yet factored; no additional information available >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_3_) 1 MPI process >>> type: seqaij >>> rows=300, cols=300 >>> total: nonzeros=2292, allocated nonzeros=2292 >>> total number of mallocs used during MatSetValues calls=0 >>> not using I-node routines >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI process >>> type: seqaij >>> rows=1219, cols=1219 >>> total: nonzeros=26443, allocated nonzeros=26443 >>> total number of mallocs used during MatSetValues calls=0 >>> not using I-node routines >>> solving time: 0.00449609 >>> iterations: 0 >>> estimated error: 25.4142 >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Tue Feb 14 10:44:54 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Tue, 14 Feb 2023 10:44:54 -0600 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: Message-ID: I was trying to modify the header flags from "Int32" to "Int64", but the problem was not resolved. Could I get any additional comments? Thanks, Mike > Thanks for the comments. > To be precise on the question, the entire part of the header of the .vtu > file is attached: > > > > > > > format="appended" offset="0" /> > > > format="appended" offset="116932" /> > format="appended" offset="372936" /> > format="appended" offset="404940" /> > > > format="appended" offset="408944" /> > > > NumberOfComponents="1" format="appended" offset="424948" /> > > > > > format="appended" offset="463928" /> > > > format="appended" offset="580860" /> > format="appended" offset="836864" /> > format="appended" offset="868868" /> > > > format="appended" offset="872872" /> > > > NumberOfComponents="1" format="appended" offset="888876" /> > > > > > > > Thanks, > Mike > > >> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell >> wrote: >> >>> Dear PETSc team, >>> >>> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm >>> object. To print out output variable and mesh connectivity, I use VecView() >>> by defining PetscSection on that dm and borrow a vector. The type of the >>> viewer is set to PETSCVIEWERVTK. >>> >>> With 32bit indices, the above work flow has no issue. However, if PETSc >>> is configured with 64bit indices, my output .vtu file has an error if I >>> open the file with visualization tools, such as Paraview or Tecplot, saying >>> that: >>> "Cannot read cell connectivity from Cells in piece 0 because the >>> "offsets" array is not monotonically increasing or starts with a value >>> other than 0." >>> >>> If I open the .vtu file from terminal, I can see such a line: >>> ... >>> >> format="appended" offset="580860" /> >>> ... >>> >>> I expected "DataArray type="Int64", since the PETSc has 64bit indices. >>> Could I get recommendations that I need to check to resolve the issue? >>> >> >> This is probably a bug. We will look at it. >> >> Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt) is >> used to calculate the offset, which looks inconsistent. Can you take a look? >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Mike >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Tue Feb 14 12:43:08 2023 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Tue, 14 Feb 2023 19:43:08 +0100 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: Hi Matt, So I have done some research these days and I have found out that I might try to assemble the SIMPLE for Schur approximation (myS = A11 - A10 inv(DIAGFORM(A00)) A01). Reading papers around, I come up with a doubt, which I believe to be a very silly one but worth asking... Is the way the unknowns are packed in the matrix relevant for schur preconditioning? I was studying a bit ex70.c, there the block matrix is defined like: A = [A00 A10 A10 A11] Where A00 is the momentum equation matrix, A11 is the pressure equation matrix, while A01 and A10 are the matrices for the coupling terms (i.e. pressure gradient and continuity). The unknowns are x = [u1..uN v1...vN w1...wN p1...pN]^T In my case, I assemble the matrix cell by cell (FV method), and the result will be this one: [image: image.png] Then I split the fields giving index 0-1 for u and 2 for p. I guess Petsc is already doing the correct handling picking up the *a^33s* to assemble A11, but worth being 100% sure :) Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 28154 bytes Desc: not available URL: From bsmith at petsc.dev Tue Feb 14 14:35:08 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 14 Feb 2023 15:35:08 -0500 Subject: [petsc-users] MatFDColoringSetUp with Periodic BC In-Reply-To: <919234c3017db5d21098774229cbb583@onera.fr> References: <919234c3017db5d21098774229cbb583@onera.fr> Message-ID: I have created an MR that documents this and moved the error checking to a more appropriate location https://gitlab.com/petsc/petsc/-/merge_requests/6070 > On Feb 14, 2023, at 4:33 AM, Pierre Bernigaud wrote: > > Dear all, > > I hope this email finds you well. > We are currently working on a solver which is employing DMDA with SNES. The jacobian is computed via FDColoring, ie: > > call DMDACreate1D(PETSC_COMM_WORLD, DM_BOUNDARY_GHOSTED, NY, NC, NGc, PETSC_NULL_INTEGER, dmF, ierr) > > ! ----- Some steps ... > > call DMCreateColoring(dmF, IS_COLORING_GLOBAL, iscoloring, ierr) > call MatFDColoringCreate(Jac,iscoloring, matfdcoloring, ierr) > call MatFDColoringSetFunction(matfdcoloring, FormFunction, CTX, ierr) > call MatFDColoringSetUp(Jac ,iscoloring,matfdcoloring, ierr) > call SNESSetJacobian(snes, Jac, Jac, SNESComputeJacobianDefaultColor, matfdcoloring, ierr) > > Everything is running smoothly. > Recently, we modified the boundary conditions such as to use periodic BC: > > call DMDACreate1D(PETSC_COMM_WORLD, DM_BOUNDARY_PERIODIC, NY, NC, NGc, PETSC_NULL_INTEGER, dmF, ierr) > > We then encountered frequent crashes when calling MatFDColoringSetUp, depending on the number of cells NY. After looking for an solution, I found this old thread: https://lists.mcs.anl.gov/pipermail/petsc-users/2013-May/017449.html > It appears that when using periodic BC, FDColoring can only be used if the number of cells is divisible by 2*NGc+1. Even though this is only a slight annoyance, I was wondering if you were working on this matter / if you had a quick fix at hand? At any rate, I think it would be nice if a warning was displayed in the FDColoring documentation? > > Respectfully, > Pierre Bernigaud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 14 16:32:07 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 14 Feb 2023 17:32:07 -0500 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: Message-ID: On Tue, Feb 14, 2023 at 11:45 AM Mike Michell wrote: > I was trying to modify the header flags from "Int32" to "Int64", but the > problem was not resolved. Could I get any additional comments? > The calculated offsets are not correct I think. Matt > Thanks, > Mike > > >> Thanks for the comments. >> To be precise on the question, the entire part of the header of the .vtu >> file is attached: >> >> >> >> >> >> >> > format="appended" offset="0" /> >> >> >> > NumberOfComponents="1" format="appended" offset="116932" /> >> > NumberOfComponents="1" format="appended" offset="372936" /> >> > NumberOfComponents="1" format="appended" offset="404940" /> >> >> >> > format="appended" offset="408944" /> >> >> >> > NumberOfComponents="1" format="appended" offset="424948" /> >> >> >> >> >> > format="appended" offset="463928" /> >> >> >> > NumberOfComponents="1" format="appended" offset="580860" /> >> > NumberOfComponents="1" format="appended" offset="836864" /> >> > NumberOfComponents="1" format="appended" offset="868868" /> >> >> >> > format="appended" offset="872872" /> >> >> >> > NumberOfComponents="1" format="appended" offset="888876" /> >> >> >> >> >> >> >> Thanks, >> Mike >> >> >>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell >>> wrote: >>> >>>> Dear PETSc team, >>>> >>>> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm >>>> object. To print out output variable and mesh connectivity, I use VecView() >>>> by defining PetscSection on that dm and borrow a vector. The type of the >>>> viewer is set to PETSCVIEWERVTK. >>>> >>>> With 32bit indices, the above work flow has no issue. However, if PETSc >>>> is configured with 64bit indices, my output .vtu file has an error if I >>>> open the file with visualization tools, such as Paraview or Tecplot, saying >>>> that: >>>> "Cannot read cell connectivity from Cells in piece 0 because the >>>> "offsets" array is not monotonically increasing or starts with a value >>>> other than 0." >>>> >>>> If I open the .vtu file from terminal, I can see such a line: >>>> ... >>>> >>> format="appended" offset="580860" /> >>>> ... >>>> >>>> I expected "DataArray type="Int64", since the PETSc has 64bit indices. >>>> Could I get recommendations that I need to check to resolve the issue? >>>> >>> >>> This is probably a bug. We will look at it. >>> >>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt) is >>> used to calculate the offset, which looks inconsistent. Can you take a look? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Mike >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Tue Feb 14 17:10:17 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Tue, 14 Feb 2023 17:10:17 -0600 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: Message-ID: Thanks for the note. I understood that PETSc calculates the offsets for me through "boffset" variable in plexvtu.c file. Please correct me if it is wrong. If plexvtu.c has a bug, it could be around "write file header" part in which the boffset is also computed. Is this correct? I am not using complex number. There are several mixed parts among "Int32, UInt8, PetscInt_FMT, PetscInt64_FMT" in writing the header. Which combination of those flags is correct for 64bit indices? I am gonna modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" statement, but I do not know what is the correct form of the header flag for 64bit indices. It is also confusing to me: boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); How is sizeof(PetscInt) different from sizeof(int)? Thanks, Mike > On Tue, Feb 14, 2023 at 11:45 AM Mike Michell > wrote: > >> I was trying to modify the header flags from "Int32" to "Int64", but the >> problem was not resolved. Could I get any additional comments? >> > > The calculated offsets are not correct I think. > > Matt > > >> Thanks, >> Mike >> >> >>> Thanks for the comments. >>> To be precise on the question, the entire part of the header of the .vtu >>> file is attached: >>> >>> >>> >>> >>> >>> >>> >> format="appended" offset="0" /> >>> >>> >>> >> NumberOfComponents="1" format="appended" offset="116932" /> >>> >> NumberOfComponents="1" format="appended" offset="372936" /> >>> >> NumberOfComponents="1" format="appended" offset="404940" /> >>> >>> >>> >> format="appended" offset="408944" /> >>> >>> >>> >> NumberOfComponents="1" format="appended" offset="424948" /> >>> >>> >>> >>> >>> >> format="appended" offset="463928" /> >>> >>> >>> >> NumberOfComponents="1" format="appended" offset="580860" /> >>> >> NumberOfComponents="1" format="appended" offset="836864" /> >>> >> NumberOfComponents="1" format="appended" offset="868868" /> >>> >>> >>> >> format="appended" offset="872872" /> >>> >>> >>> >> NumberOfComponents="1" format="appended" offset="888876" /> >>> >>> >>> >>> >>> >>> >>> Thanks, >>> Mike >>> >>> >>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell >>>> wrote: >>>> >>>>> Dear PETSc team, >>>>> >>>>> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm >>>>> object. To print out output variable and mesh connectivity, I use VecView() >>>>> by defining PetscSection on that dm and borrow a vector. The type of the >>>>> viewer is set to PETSCVIEWERVTK. >>>>> >>>>> With 32bit indices, the above work flow has no issue. However, if >>>>> PETSc is configured with 64bit indices, my output .vtu file has an error if >>>>> I open the file with visualization tools, such as Paraview or Tecplot, >>>>> saying that: >>>>> "Cannot read cell connectivity from Cells in piece 0 because the >>>>> "offsets" array is not monotonically increasing or starts with a value >>>>> other than 0." >>>>> >>>>> If I open the .vtu file from terminal, I can see such a line: >>>>> ... >>>>> >>>> format="appended" offset="580860" /> >>>>> ... >>>>> >>>>> I expected "DataArray type="Int64", since the PETSc has 64bit indices. >>>>> Could I get recommendations that I need to check to resolve the issue? >>>>> >>>> >>>> This is probably a bug. We will look at it. >>>> >>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt) >>>> is used to calculate the offset, which looks inconsistent. Can you take a >>>> look? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Mike >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Feb 14 19:17:16 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 14 Feb 2023 18:17:16 -0700 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: Message-ID: <87bklvwwv7.fsf@jedbrown.org> Can you share a reproducer? I think I recall the format requiring certain things to be Int32. Mike Michell writes: > Thanks for the note. > I understood that PETSc calculates the offsets for me through "boffset" > variable in plexvtu.c file. Please correct me if it is wrong. > > If plexvtu.c has a bug, it could be around "write file header" part in > which the boffset is also computed. Is this correct? I am not using complex > number. > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, > PetscInt64_FMT" in writing the header. > > Which combination of those flags is correct for 64bit indices? I am gonna > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" > statement, but I do not know what is the correct form of the header flag > for 64bit indices. > > It is also confusing to me: > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); > How is sizeof(PetscInt) different from sizeof(int)? > > Thanks, > Mike > > >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell >> wrote: >> >>> I was trying to modify the header flags from "Int32" to "Int64", but the >>> problem was not resolved. Could I get any additional comments? >>> >> >> The calculated offsets are not correct I think. >> >> Matt >> >> >>> Thanks, >>> Mike >>> >>> >>>> Thanks for the comments. >>>> To be precise on the question, the entire part of the header of the .vtu >>>> file is attached: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> format="appended" offset="0" /> >>>> >>>> >>>> >>> NumberOfComponents="1" format="appended" offset="116932" /> >>>> >>> NumberOfComponents="1" format="appended" offset="372936" /> >>>> >>> NumberOfComponents="1" format="appended" offset="404940" /> >>>> >>>> >>>> >>> format="appended" offset="408944" /> >>>> >>>> >>>> >>> NumberOfComponents="1" format="appended" offset="424948" /> >>>> >>>> >>>> >>>> >>>> >>> format="appended" offset="463928" /> >>>> >>>> >>>> >>> NumberOfComponents="1" format="appended" offset="580860" /> >>>> >>> NumberOfComponents="1" format="appended" offset="836864" /> >>>> >>> NumberOfComponents="1" format="appended" offset="868868" /> >>>> >>>> >>>> >>> format="appended" offset="872872" /> >>>> >>>> >>>> >>> NumberOfComponents="1" format="appended" offset="888876" /> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> Mike >>>> >>>> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell >>>>> wrote: >>>>> >>>>>> Dear PETSc team, >>>>>> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm >>>>>> object. To print out output variable and mesh connectivity, I use VecView() >>>>>> by defining PetscSection on that dm and borrow a vector. The type of the >>>>>> viewer is set to PETSCVIEWERVTK. >>>>>> >>>>>> With 32bit indices, the above work flow has no issue. However, if >>>>>> PETSc is configured with 64bit indices, my output .vtu file has an error if >>>>>> I open the file with visualization tools, such as Paraview or Tecplot, >>>>>> saying that: >>>>>> "Cannot read cell connectivity from Cells in piece 0 because the >>>>>> "offsets" array is not monotonically increasing or starts with a value >>>>>> other than 0." >>>>>> >>>>>> If I open the .vtu file from terminal, I can see such a line: >>>>>> ... >>>>>> >>>>> format="appended" offset="580860" /> >>>>>> ... >>>>>> >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit indices. >>>>>> Could I get recommendations that I need to check to resolve the issue? >>>>>> >>>>> >>>>> This is probably a bug. We will look at it. >>>>> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt) >>>>> is used to calculate the offset, which looks inconsistent. Can you take a >>>>> look? >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks, >>>>>> Mike >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> From dave.mayhem23 at gmail.com Tue Feb 14 23:03:42 2023 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 14 Feb 2023 21:03:42 -0800 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: <87bklvwwv7.fsf@jedbrown.org> References: <87bklvwwv7.fsf@jedbrown.org> Message-ID: On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: > Can you share a reproducer? I think I recall the format requiring certain > things to be Int32. By default, the byte offset used with the appended data format is UInt32. I believe that?s where the sizeof(int) is coming from. This default is annoying as it limits the total size of your appended data to be < 3 GB. That said, in the opening of the paraview file you can add this attribute header_type="UInt64" then the size of the offset is now UInt64 and now large files can be finally written. Cheers, Dave > > Mike Michell writes: > > > Thanks for the note. > > I understood that PETSc calculates the offsets for me through "boffset" > > variable in plexvtu.c file. Please correct me if it is wrong. > > > > If plexvtu.c has a bug, it could be around "write file header" part in > > which the boffset is also computed. Is this correct? I am not using > complex > > number. > > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, > > PetscInt64_FMT" in writing the header. > > > > Which combination of those flags is correct for 64bit indices? I am gonna > > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" > > statement, but I do not know what is the correct form of the header flag > > for 64bit indices. > > > > It is also confusing to me: > > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); > > How is sizeof(PetscInt) different from sizeof(int)? > > > > Thanks, > > Mike > > > > > >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell > >> wrote: > >> > >>> I was trying to modify the header flags from "Int32" to "Int64", but > the > >>> problem was not resolved. Could I get any additional comments? > >>> > >> > >> The calculated offsets are not correct I think. > >> > >> Matt > >> > >> > >>> Thanks, > >>> Mike > >>> > >>> > >>>> Thanks for the comments. > >>>> To be precise on the question, the entire part of the header of the > .vtu > >>>> file is attached: > >>>> > >>>> > >>>> byte_order="LittleEndian"> > >>>> > >>>> > >>>> > >>>> NumberOfComponents="3" > >>>> format="appended" offset="0" /> > >>>> > >>>> > >>>> >>>> NumberOfComponents="1" format="appended" offset="116932" /> > >>>> >>>> NumberOfComponents="1" format="appended" offset="372936" /> > >>>> >>>> NumberOfComponents="1" format="appended" offset="404940" /> > >>>> > >>>> > >>>> >>>> format="appended" offset="408944" /> > >>>> > >>>> > >>>> >>>> NumberOfComponents="1" format="appended" offset="424948" /> > >>>> > >>>> > >>>> > >>>> > >>>> NumberOfComponents="3" > >>>> format="appended" offset="463928" /> > >>>> > >>>> > >>>> >>>> NumberOfComponents="1" format="appended" offset="580860" /> > >>>> >>>> NumberOfComponents="1" format="appended" offset="836864" /> > >>>> >>>> NumberOfComponents="1" format="appended" offset="868868" /> > >>>> > >>>> > >>>> >>>> format="appended" offset="872872" /> > >>>> > >>>> > >>>> >>>> NumberOfComponents="1" format="appended" offset="888876" /> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Thanks, > >>>> Mike > >>>> > >>>> > >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell > >>>>> wrote: > >>>>> > >>>>>> Dear PETSc team, > >>>>>> > >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm > >>>>>> object. To print out output variable and mesh connectivity, I use > VecView() > >>>>>> by defining PetscSection on that dm and borrow a vector. The type > of the > >>>>>> viewer is set to PETSCVIEWERVTK. > >>>>>> > >>>>>> With 32bit indices, the above work flow has no issue. However, if > >>>>>> PETSc is configured with 64bit indices, my output .vtu file has an > error if > >>>>>> I open the file with visualization tools, such as Paraview or > Tecplot, > >>>>>> saying that: > >>>>>> "Cannot read cell connectivity from Cells in piece 0 because the > >>>>>> "offsets" array is not monotonically increasing or starts with a > value > >>>>>> other than 0." > >>>>>> > >>>>>> If I open the .vtu file from terminal, I can see such a line: > >>>>>> ... > >>>>>> >>>>>> format="appended" offset="580860" /> > >>>>>> ... > >>>>>> > >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit > indices. > >>>>>> Could I get recommendations that I need to check to resolve the > issue? > >>>>>> > >>>>> > >>>>> This is probably a bug. We will look at it. > >>>>> > >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt) > >>>>> is used to calculate the offset, which looks inconsistent. Can you > take a > >>>>> look? > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Matt > >>>>> > >>>>> > >>>>>> Thanks, > >>>>>> Mike > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> What most experimenters take for granted before they begin their > >>>>> experiments is infinitely more interesting than any results to which > their > >>>>> experiments lead. > >>>>> -- Norbert Wiener > >>>>> > >>>>> https://www.cse.buffalo.edu/~knepley/ > >>>>> > >>>>> > >>>> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > their > >> experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Feb 14 23:24:02 2023 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 14 Feb 2023 21:24:02 -0800 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: <87bklvwwv7.fsf@jedbrown.org> Message-ID: On Tue 14. Feb 2023 at 21:03, Dave May wrote: > > > On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: > >> Can you share a reproducer? I think I recall the format requiring certain >> things to be Int32. > > > By default, the byte offset used with the appended data format is UInt32. > I believe that?s where the sizeof(int) is coming from. This default is > annoying as it limits the total size of your appended data to be < 3 GB. > Oops, I meant to type 4 GB That said, in the opening of the paraview file you can add this attribute > > header_type="UInt64" > > then the size of the offset is now UInt64 and now large files can be > finally written. > > > Cheers, > Dave > > > > >> >> Mike Michell writes: >> >> > Thanks for the note. >> > I understood that PETSc calculates the offsets for me through "boffset" >> > variable in plexvtu.c file. Please correct me if it is wrong. >> > >> > If plexvtu.c has a bug, it could be around "write file header" part in >> > which the boffset is also computed. Is this correct? I am not using >> complex >> > number. >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, >> > PetscInt64_FMT" in writing the header. >> > >> > Which combination of those flags is correct for 64bit indices? I am >> gonna >> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" >> > statement, but I do not know what is the correct form of the header flag >> > for 64bit indices. >> > >> > It is also confusing to me: >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); >> > How is sizeof(PetscInt) different from sizeof(int)? >> > >> > Thanks, >> > Mike >> > >> > >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell >> >> wrote: >> >> >> >>> I was trying to modify the header flags from "Int32" to "Int64", but >> the >> >>> problem was not resolved. Could I get any additional comments? >> >>> >> >> >> >> The calculated offsets are not correct I think. >> >> >> >> Matt >> >> >> >> >> >>> Thanks, >> >>> Mike >> >>> >> >>> >> >>>> Thanks for the comments. >> >>>> To be precise on the question, the entire part of the header of the >> .vtu >> >>>> file is attached: >> >>>> >> >>>> >> >>>> > byte_order="LittleEndian"> >> >>>> >> >>>> >> >>>> >> >>>> > NumberOfComponents="3" >> >>>> format="appended" offset="0" /> >> >>>> >> >>>> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="116932" /> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="372936" /> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="404940" /> >> >>>> >> >>>> >> >>>> > >>>> format="appended" offset="408944" /> >> >>>> >> >>>> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="424948" /> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> > NumberOfComponents="3" >> >>>> format="appended" offset="463928" /> >> >>>> >> >>>> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="580860" /> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="836864" /> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="868868" /> >> >>>> >> >>>> >> >>>> > >>>> format="appended" offset="872872" /> >> >>>> >> >>>> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="888876" /> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> Thanks, >> >>>> Mike >> >>>> >> >>>> >> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell > > >> >>>>> wrote: >> >>>>> >> >>>>>> Dear PETSc team, >> >>>>>> >> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm >> >>>>>> object. To print out output variable and mesh connectivity, I use >> VecView() >> >>>>>> by defining PetscSection on that dm and borrow a vector. The type >> of the >> >>>>>> viewer is set to PETSCVIEWERVTK. >> >>>>>> >> >>>>>> With 32bit indices, the above work flow has no issue. However, if >> >>>>>> PETSc is configured with 64bit indices, my output .vtu file has an >> error if >> >>>>>> I open the file with visualization tools, such as Paraview or >> Tecplot, >> >>>>>> saying that: >> >>>>>> "Cannot read cell connectivity from Cells in piece 0 because the >> >>>>>> "offsets" array is not monotonically increasing or starts with a >> value >> >>>>>> other than 0." >> >>>>>> >> >>>>>> If I open the .vtu file from terminal, I can see such a line: >> >>>>>> ... >> >>>>>> > >>>>>> format="appended" offset="580860" /> >> >>>>>> ... >> >>>>>> >> >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit >> indices. >> >>>>>> Could I get recommendations that I need to check to resolve the >> issue? >> >>>>>> >> >>>>> >> >>>>> This is probably a bug. We will look at it. >> >>>>> >> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but >> sizeof(PetscInt) >> >>>>> is used to calculate the offset, which looks inconsistent. Can you >> take a >> >>>>> look? >> >>>>> >> >>>>> Thanks, >> >>>>> >> >>>>> Matt >> >>>>> >> >>>>> >> >>>>>> Thanks, >> >>>>>> Mike >> >>>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> What most experimenters take for granted before they begin their >> >>>>> experiments is infinitely more interesting than any results to >> which their >> >>>>> experiments lead. >> >>>>> -- Norbert Wiener >> >>>>> >> >>>>> https://www.cse.buffalo.edu/~knepley/ >> >>>>> >> >>>>> >> >>>> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which >> their >> >> experiments lead. >> >> -- Norbert Wiener >> >> >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Feb 14 23:27:03 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 14 Feb 2023 22:27:03 -0700 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: <87bklvwwv7.fsf@jedbrown.org> Message-ID: <878rgzwlaw.fsf@jedbrown.org> Dave May writes: > On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: > >> Can you share a reproducer? I think I recall the format requiring certain >> things to be Int32. > > > By default, the byte offset used with the appended data format is UInt32. I > believe that?s where the sizeof(int) is coming from. This default is > annoying as it limits the total size of your appended data to be < 3 GB. > That said, in the opening of the paraview file you can add this attribute > > header_type="UInt64" You mean in the header of the .vtu? Do you happen to have an example or pointers to docs describing this feature? Can we always do this? It isn't mentioned in these: https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf (PDF was created in 2003) https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats > then the size of the offset is now UInt64 and now large files can be > finally written. > > > Cheers, > Dave > > > > >> >> Mike Michell writes: >> >> > Thanks for the note. >> > I understood that PETSc calculates the offsets for me through "boffset" >> > variable in plexvtu.c file. Please correct me if it is wrong. >> > >> > If plexvtu.c has a bug, it could be around "write file header" part in >> > which the boffset is also computed. Is this correct? I am not using >> complex >> > number. >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, >> > PetscInt64_FMT" in writing the header. >> > >> > Which combination of those flags is correct for 64bit indices? I am gonna >> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" >> > statement, but I do not know what is the correct form of the header flag >> > for 64bit indices. >> > >> > It is also confusing to me: >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); >> > How is sizeof(PetscInt) different from sizeof(int)? >> > >> > Thanks, >> > Mike >> > >> > >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell >> >> wrote: >> >> >> >>> I was trying to modify the header flags from "Int32" to "Int64", but >> the >> >>> problem was not resolved. Could I get any additional comments? >> >>> >> >> >> >> The calculated offsets are not correct I think. >> >> >> >> Matt >> >> >> >> >> >>> Thanks, >> >>> Mike >> >>> >> >>> >> >>>> Thanks for the comments. >> >>>> To be precise on the question, the entire part of the header of the >> .vtu >> >>>> file is attached: >> >>>> >> >>>> >> >>>> > byte_order="LittleEndian"> >> >>>> >> >>>> >> >>>> >> >>>> > NumberOfComponents="3" >> >>>> format="appended" offset="0" /> >> >>>> >> >>>> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="116932" /> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="372936" /> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="404940" /> >> >>>> >> >>>> >> >>>> > >>>> format="appended" offset="408944" /> >> >>>> >> >>>> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="424948" /> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> > NumberOfComponents="3" >> >>>> format="appended" offset="463928" /> >> >>>> >> >>>> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="580860" /> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="836864" /> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="868868" /> >> >>>> >> >>>> >> >>>> > >>>> format="appended" offset="872872" /> >> >>>> >> >>>> >> >>>> > >>>> NumberOfComponents="1" format="appended" offset="888876" /> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> Thanks, >> >>>> Mike >> >>>> >> >>>> >> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell >> >>>>> wrote: >> >>>>> >> >>>>>> Dear PETSc team, >> >>>>>> >> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm >> >>>>>> object. To print out output variable and mesh connectivity, I use >> VecView() >> >>>>>> by defining PetscSection on that dm and borrow a vector. The type >> of the >> >>>>>> viewer is set to PETSCVIEWERVTK. >> >>>>>> >> >>>>>> With 32bit indices, the above work flow has no issue. However, if >> >>>>>> PETSc is configured with 64bit indices, my output .vtu file has an >> error if >> >>>>>> I open the file with visualization tools, such as Paraview or >> Tecplot, >> >>>>>> saying that: >> >>>>>> "Cannot read cell connectivity from Cells in piece 0 because the >> >>>>>> "offsets" array is not monotonically increasing or starts with a >> value >> >>>>>> other than 0." >> >>>>>> >> >>>>>> If I open the .vtu file from terminal, I can see such a line: >> >>>>>> ... >> >>>>>> > >>>>>> format="appended" offset="580860" /> >> >>>>>> ... >> >>>>>> >> >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit >> indices. >> >>>>>> Could I get recommendations that I need to check to resolve the >> issue? >> >>>>>> >> >>>>> >> >>>>> This is probably a bug. We will look at it. >> >>>>> >> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt) >> >>>>> is used to calculate the offset, which looks inconsistent. Can you >> take a >> >>>>> look? >> >>>>> >> >>>>> Thanks, >> >>>>> >> >>>>> Matt >> >>>>> >> >>>>> >> >>>>>> Thanks, >> >>>>>> Mike >> >>>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> What most experimenters take for granted before they begin their >> >>>>> experiments is infinitely more interesting than any results to which >> their >> >>>>> experiments lead. >> >>>>> -- Norbert Wiener >> >>>>> >> >>>>> https://www.cse.buffalo.edu/~knepley/ >> >>>>> >> >>>>> >> >>>> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which >> their >> >> experiments lead. >> >> -- Norbert Wiener >> >> >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> >> From dave.mayhem23 at gmail.com Wed Feb 15 01:01:20 2023 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 14 Feb 2023 23:01:20 -0800 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: <878rgzwlaw.fsf@jedbrown.org> References: <87bklvwwv7.fsf@jedbrown.org> <878rgzwlaw.fsf@jedbrown.org> Message-ID: On Tue 14. Feb 2023 at 21:27, Jed Brown wrote: > Dave May writes: > > > On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: > > > >> Can you share a reproducer? I think I recall the format requiring > certain > >> things to be Int32. > > > > > > By default, the byte offset used with the appended data format is > UInt32. I > > believe that?s where the sizeof(int) is coming from. This default is > > annoying as it limits the total size of your appended data to be < 3 GB. > > That said, in the opening of the paraview file you can add this attribute > > > > header_type="UInt64" > > You mean in the header of the .vtu? Yeah, within the open VTKFile tag. Like this < VTKFile type=?xxx?, byte_order="LittleEndian" header_type="UInt64" > Do you happen to have an example or pointers to docs describing this > feature? Example yes - will send it to you tomorrow. Docs? not really. Only stuff like this https://kitware.github.io/paraview-docs/latest/python/paraview.simple.XMLPStructuredGridWriter.html https://kitware.github.io/paraview-docs/v5.8.0/python/paraview.simple.XMLMultiBlockDataWriter.html All the writers seem to support it. Can we always do this? Yep! It isn't mentioned in these: > > https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf (PDF was > created in 2003) > > https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats > Yes I know. I?ve tied myself in knots for years because the of the assumption that the offset had to be an int. Credit for the discovery goes to Carsten Uphoff. He showed me this. Cheers, Dave > > then the size of the offset is now UInt64 and now large files can be > > finally written. > > > > > > Cheers, > > Dave > > > > > > > > > >> > >> Mike Michell writes: > >> > >> > Thanks for the note. > >> > I understood that PETSc calculates the offsets for me through > "boffset" > >> > variable in plexvtu.c file. Please correct me if it is wrong. > >> > > >> > If plexvtu.c has a bug, it could be around "write file header" part in > >> > which the boffset is also computed. Is this correct? I am not using > >> complex > >> > number. > >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, > >> > PetscInt64_FMT" in writing the header. > >> > > >> > Which combination of those flags is correct for 64bit indices? I am > gonna > >> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" > >> > statement, but I do not know what is the correct form of the header > flag > >> > for 64bit indices. > >> > > >> > It is also confusing to me: > >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); > >> > How is sizeof(PetscInt) different from sizeof(int)? > >> > > >> > Thanks, > >> > Mike > >> > > >> > > >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell > > >> >> wrote: > >> >> > >> >>> I was trying to modify the header flags from "Int32" to "Int64", but > >> the > >> >>> problem was not resolved. Could I get any additional comments? > >> >>> > >> >> > >> >> The calculated offsets are not correct I think. > >> >> > >> >> Matt > >> >> > >> >> > >> >>> Thanks, > >> >>> Mike > >> >>> > >> >>> > >> >>>> Thanks for the comments. > >> >>>> To be precise on the question, the entire part of the header of the > >> .vtu > >> >>>> file is attached: > >> >>>> > >> >>>> > >> >>>> >> byte_order="LittleEndian"> > >> >>>> > >> >>>> > >> >>>> > >> >>>> >> NumberOfComponents="3" > >> >>>> format="appended" offset="0" /> > >> >>>> > >> >>>> > >> >>>> >> >>>> NumberOfComponents="1" format="appended" offset="116932" /> > >> >>>> >> >>>> NumberOfComponents="1" format="appended" offset="372936" /> > >> >>>> >> >>>> NumberOfComponents="1" format="appended" offset="404940" /> > >> >>>> > >> >>>> > >> >>>> >> >>>> format="appended" offset="408944" /> > >> >>>> > >> >>>> > >> >>>> >> >>>> NumberOfComponents="1" format="appended" offset="424948" /> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> >> NumberOfComponents="3" > >> >>>> format="appended" offset="463928" /> > >> >>>> > >> >>>> > >> >>>> >> >>>> NumberOfComponents="1" format="appended" offset="580860" /> > >> >>>> >> >>>> NumberOfComponents="1" format="appended" offset="836864" /> > >> >>>> >> >>>> NumberOfComponents="1" format="appended" offset="868868" /> > >> >>>> > >> >>>> > >> >>>> >> >>>> format="appended" offset="872872" /> > >> >>>> > >> >>>> > >> >>>> >> >>>> NumberOfComponents="1" format="appended" offset="888876" /> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> Thanks, > >> >>>> Mike > >> >>>> > >> >>>> > >> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell < > mi.mike1021 at gmail.com> > >> >>>>> wrote: > >> >>>>> > >> >>>>>> Dear PETSc team, > >> >>>>>> > >> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to handle > dm > >> >>>>>> object. To print out output variable and mesh connectivity, I use > >> VecView() > >> >>>>>> by defining PetscSection on that dm and borrow a vector. The type > >> of the > >> >>>>>> viewer is set to PETSCVIEWERVTK. > >> >>>>>> > >> >>>>>> With 32bit indices, the above work flow has no issue. However, if > >> >>>>>> PETSc is configured with 64bit indices, my output .vtu file has > an > >> error if > >> >>>>>> I open the file with visualization tools, such as Paraview or > >> Tecplot, > >> >>>>>> saying that: > >> >>>>>> "Cannot read cell connectivity from Cells in piece 0 because the > >> >>>>>> "offsets" array is not monotonically increasing or starts with a > >> value > >> >>>>>> other than 0." > >> >>>>>> > >> >>>>>> If I open the .vtu file from terminal, I can see such a line: > >> >>>>>> ... > >> >>>>>> NumberOfComponents="1" > >> >>>>>> format="appended" offset="580860" /> > >> >>>>>> ... > >> >>>>>> > >> >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit > >> indices. > >> >>>>>> Could I get recommendations that I need to check to resolve the > >> issue? > >> >>>>>> > >> >>>>> > >> >>>>> This is probably a bug. We will look at it. > >> >>>>> > >> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but > sizeof(PetscInt) > >> >>>>> is used to calculate the offset, which looks inconsistent. Can you > >> take a > >> >>>>> look? > >> >>>>> > >> >>>>> Thanks, > >> >>>>> > >> >>>>> Matt > >> >>>>> > >> >>>>> > >> >>>>>> Thanks, > >> >>>>>> Mike > >> >>>>>> > >> >>>>> > >> >>>>> > >> >>>>> -- > >> >>>>> What most experimenters take for granted before they begin their > >> >>>>> experiments is infinitely more interesting than any results to > which > >> their > >> >>>>> experiments lead. > >> >>>>> -- Norbert Wiener > >> >>>>> > >> >>>>> https://www.cse.buffalo.edu/~knepley/ > >> >>>>> > >> >>>>> > >> >>>> > >> >> > >> >> -- > >> >> What most experimenters take for granted before they begin their > >> >> experiments is infinitely more interesting than any results to which > >> their > >> >> experiments lead. > >> >> -- Norbert Wiener > >> >> > >> >> https://www.cse.buffalo.edu/~knepley/ > >> >> > >> >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.bernigaud at onera.fr Wed Feb 15 03:56:40 2023 From: pierre.bernigaud at onera.fr (Bernigaud Pierre) Date: Wed, 15 Feb 2023 10:56:40 +0100 Subject: [petsc-users] MatFDColoringSetUp with Periodic BC In-Reply-To: References: <919234c3017db5d21098774229cbb583@onera.fr> Message-ID: <12b964c0-b402-107d-288b-df29fdfdd757@onera.fr> Barry, Thanks for this quick answer. Best, Pierre Le 14/02/2023 ? 21:35, Barry Smith a ?crit?: > > ? I have created an MR that documents this and moved the error > checking to a more appropriate location > https://gitlab.com/petsc/petsc/-/merge_requests/6070 > > >> On Feb 14, 2023, at 4:33 AM, Pierre Bernigaud >> wrote: >> >> Dear all, >> >> I hope this email finds you well. >> We are currently working on a solver which is employing DMDA with >> SNES. The jacobian is computed via FDColoring, ie: >> >> call DMDACreate1D(PETSC_COMM_WORLD, DM_BOUNDARY_GHOSTED, NY, NC, NGc, >> PETSC_NULL_INTEGER, dmF, ierr) >> >> ! ----- Some steps ... >> >> call DMCreateColoring(dmF, IS_COLORING_GLOBAL, iscoloring, ierr) >> call MatFDColoringCreate(Jac,iscoloring, matfdcoloring, ierr) >> call MatFDColoringSetFunction(matfdcoloring, FormFunction, CTX, ierr) >> call MatFDColoringSetUp(Jac ,iscoloring,matfdcoloring, ierr) >> call SNESSetJacobian(snes, Jac, Jac, SNESComputeJacobianDefaultColor, >> matfdcoloring, ierr) >> >> Everything is running smoothly. >> Recently, we modified the boundary conditions such as to use periodic >> BC: >> >> call DMDACreate1D(PETSC_COMM_WORLD, DM_BOUNDARY_PERIODIC, NY, NC, >> NGc, PETSC_NULL_INTEGER, dmF, ierr) >> >> We then encountered frequent crashes when calling MatFDColoringSetUp, >> depending on the number of cells NY. After looking for an solution, I >> found this old thread: >> https://lists.mcs.anl.gov/pipermail/petsc-users/2013-May/017449.html >> It appears that when using periodic BC, FDColoring can only be used >> if the number of cells is divisible by 2*NGc+1. Even though this is >> only a slight annoyance, I was wondering if you were working on this >> matter / if you had a quick fix at hand? At any rate, I think it >> would be nice if a warning was displayed in the FDColoring >> documentation? >> >> Respectfully, >> Pierre Bernigaud >> > -- *Pierre Bernigaud* Doctorant D?partement multi-physique pour l??nerg?tique Mod?lisation Propulsion Fus?e T?l: +33 1 80 38 62 33 ONERA?-?The French Aerospace Lab?-?Centre de Palaiseau 6, Chemin de la Vauve aux Granges - 91123 PALAISEAU Coordonn?es GPS : 48.715169, 2.232833 Nous suivre sur : www.onera.fr ?| Twitter ?| LinkedIn ?| Facebook ?| Instagram Avertissement/disclaimer https://www.onera.fr/en/emails-terms -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: faohmnjfdoleeodh.gif Type: image/gif Size: 1041 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecmmhhmnlejemldm.png Type: image/png Size: 16755 bytes Desc: not available URL: From margherita.guido at epfl.ch Wed Feb 15 09:34:03 2023 From: margherita.guido at epfl.ch (Guido Margherita) Date: Wed, 15 Feb 2023 15:34:03 +0000 Subject: [petsc-users] MatMatMul inefficient In-Reply-To: <05E5037D-4AC5-4309-A76C-39180DEAF5E1@joliv.et> References: <94FC9342-6963-45CB-AC76-63253C9F62F4@epfl.ch> <05E5037D-4AC5-4309-A76C-39180DEAF5E1@joliv.et> Message-ID: <972BE937-28D6-4C8A-8D80-2EE0F3758DF5@epfl.ch> Hi, You can find the reproducer at this link https://github.com/margheguido/Miniapp_MatMatMul , including the matrices I used. I have trouble undrerstanding what is different in my case from the one you referenced me to. Thank you so much, Margherita Il giorno 13 feb 2023, alle ore 3:51 PM, Pierre Jolivet ha scritto: Could you please share a reproducer? What you are seeing is not typical of the performance of such a kernel, both from a theoretical or a practical (see fig. 2 of https://joliv.et/article.pdf) point of view. Thanks, Pierre On 13 Feb 2023, at 3:38 PM, Guido Margherita via petsc-users wrote: ?A is a sparse MATSEQAIJ, Q is dense. Thanks, Margherita Il giorno 13 feb 2023, alle ore 3:27 PM, knepley at gmail.com ha scritto: On Mon, Feb 13, 2023 at 9:21 AM Guido Margherita via petsc-users wrote: Hi all, I realised that performing a matrix-matrix multiplication using the function MatMatMult it is not at all computationally efficient with respect to performing N times a matrix-vector multiplication with MatMul, being N the number of columns of the second matrix in the product. When I multiply I matrix A 46816 x 46816 to a matrix Q 46816 x 6, the MatMatMul function is indeed 6 times more expensive than 6 times a call to MatMul, when performed sequentially (0.04056 s vs 0.0062 s ). When the same code is run in parallel the gap grows even more, being10 times more expensive. Is there an explanation for it? So we can reproduce this, what kind of matrix is A? I am assuming that Q is dense. Thanks, Matt t1 = MPI_Wtime() call MatMatMult(A,Q,MAT_INITIAL_MATRIX, PETSC_DEFAULT_REAL, AQ, ierr ) t2 = MPI_Wtime() t_MatMatMul = t2-t1 t_MatMul=0.0 do j = 0, m-1 call MatGetColumnVector(Q, q_vec, j,ierr) t1 = MPI_Wtime() call MatMult(A, q_vec, aq_vec, ierr) t2 = MPI_Wtime() t_MatMul = t_MatMul + t2-t1 end do Thank you, Margherita Guido -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Wed Feb 15 11:15:47 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Wed, 15 Feb 2023 18:15:47 +0100 Subject: [petsc-users] MatMatMul inefficient In-Reply-To: <972BE937-28D6-4C8A-8D80-2EE0F3758DF5@epfl.ch> References: <94FC9342-6963-45CB-AC76-63253C9F62F4@epfl.ch> <05E5037D-4AC5-4309-A76C-39180DEAF5E1@joliv.et> <972BE937-28D6-4C8A-8D80-2EE0F3758DF5@epfl.ch> Message-ID: <96F837C4-9EE8-4E9F-808C-FF2397F33CA2@joliv.et> Thank you for the reproducer. I didn?t realize your test case was _this_ small. Still, you are not setting the MatType of Q, and PETSc tends to only like AIJ, so everything defaults to this type. So instead of doing C=AB with a sparse A and a dense B, it does a sparse-sparse product which is much costlier. If you add call MatSetType(Q,MATDENSE,ierr) before the MatLoad(), you will then get: Running with 1 processors AQ time using MatMatMul 1.0620000000471919E-003 AQ time using 6 MatMul 1.4270000001488370E-003 Not an ideal efficiency (still greater than 1 though, so we are in the clear), but things will get better if you either increase the size of A or Q. Thanks, Pierre > On 15 Feb 2023, at 4:34 PM, Guido Margherita wrote: > > Hi, > > You can find the reproducer at this link https://github.com/margheguido/Miniapp_MatMatMul , including the matrices I used. > I have trouble undrerstanding what is different in my case from the one you referenced me to. > > Thank you so much, > Margherita > >> Il giorno 13 feb 2023, alle ore 3:51 PM, Pierre Jolivet ha scritto: >> >> Could you please share a reproducer? >> What you are seeing is not typical of the performance of such a kernel, both from a theoretical or a practical (see fig. 2 of https://joliv.et/article.pdf) point of view. >> >> Thanks, >> Pierre >> >>> On 13 Feb 2023, at 3:38 PM, Guido Margherita via petsc-users wrote: >>> >>> ?A is a sparse MATSEQAIJ, Q is dense. >>> >>> Thanks, >>> Margherita >>> >>>> Il giorno 13 feb 2023, alle ore 3:27 PM, knepley at gmail.com ha scritto: >>>> >>>> On Mon, Feb 13, 2023 at 9:21 AM Guido Margherita via petsc-users wrote: >>>> Hi all, >>>> >>>> I realised that performing a matrix-matrix multiplication using the function MatMatMult it is not at all computationally efficient with respect to performing N times a matrix-vector multiplication with MatMul, being N the number of columns of the second matrix in the product. >>>> When I multiply I matrix A 46816 x 46816 to a matrix Q 46816 x 6, the MatMatMul function is indeed 6 times more expensive than 6 times a call to MatMul, when performed sequentially (0.04056 s vs 0.0062 s ). When the same code is run in parallel the gap grows even more, being10 times more expensive. >>>> Is there an explanation for it? >>>> >>>> So we can reproduce this, what kind of matrix is A? I am assuming that Q is dense. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>> t1 = MPI_Wtime() >>>> call MatMatMult(A,Q,MAT_INITIAL_MATRIX, PETSC_DEFAULT_REAL, AQ, ierr ) >>>> t2 = MPI_Wtime() >>>> t_MatMatMul = t2-t1 >>>> >>>> t_MatMul=0.0 >>>> do j = 0, m-1 >>>> call MatGetColumnVector(Q, q_vec, j,ierr) >>>> >>>> t1 = MPI_Wtime() >>>> call MatMult(A, q_vec, aq_vec, ierr) >>>> t2 = MPI_Wtime() >>>> >>>> t_MatMul = t_MatMul + t2-t1 >>>> end do >>>> >>>> Thank you, >>>> Margherita Guido >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 15 14:01:11 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 15 Feb 2023 15:01:11 -0500 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> Message-ID: <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out > On Feb 9, 2023, at 6:24 PM, Matthew Knepley wrote: > > On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: >> I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: >> >> Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) >> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 >> 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` >> 599 ?????@*/ >> 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) >> -> 601 ?????{ >> 602 ????? PetscMPIInt rank; >> 603 ????? >> 604 ????? PetscFunctionBegin; >> (lldb) frame info >> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 >> (lldb) >> The trace seems to indicate some sort of infinite loop causing an overflow. >> > > Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() > using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to > remove all error checking from the prints inside Validate. > > Thanks, > > Matt > >> PS: I'm using a arm64 mac, so I don't have access to valgrind. >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Scientific Computing Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksi2443 at gmail.com Thu Feb 16 01:42:48 2023 From: ksi2443 at gmail.com (user_gong Kim) Date: Thu, 16 Feb 2023 16:42:48 +0900 Subject: [petsc-users] Question about preconditioner Message-ID: Hello, There are some questions about some preconditioners. The questions are from problem Au=b. The global matrix A has zero value diagonal terms. 1. Which preconditioner is preferred for matrix A which has zero value in diagonal terms? The most frequently used basic 2 preconditioners are jacobi and SOR (gauss seidel). As people knows both methods should have non zero diagonal terms. Although the improved method is applied in PETSc, jacobi can also solve the case with zero diagonal term, but I ask because I know that it is not recommended. 2. Second question is about running code with the two command options below in a single process. 1st command : -ksp_type gmres -pc_type bjacobi -sub_pc_type jacobi 2nd command : -ksp_type gmres -pc_type hpddm -sub_pc_type jacobi When domain decomposition methods such as bjacobi or hpddm are parallel, the global matrix is divided for each process. As far as I know, running it in a single process should eventually produce the same result if the sub pc type is the same. However, in the second option, ksp did not converge. In this case, I wonder how to analyze the situation. How can I monitor and see the difference between the two? Thanks, Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Thu Feb 16 03:33:12 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Thu, 16 Feb 2023 10:33:12 +0100 Subject: [petsc-users] Question about preconditioner In-Reply-To: References: Message-ID: <7A210C3A-CAF7-4809-8CF5-BE56CCF04DDE@joliv.et> > On 16 Feb 2023, at 8:43 AM, user_gong Kim wrote: > > ? > > > Hello, > > > > There are some questions about some preconditioners. > > The questions are from problem Au=b. The global matrix A has zero value diagonal terms. > > 1. Which preconditioner is preferred for matrix A which has zero value in diagonal terms? > This question has not a single answer. It all depends on where your A and b are coming from. > The most frequently used basic 2 preconditioners are jacobi and SOR (gauss seidel). > They are not the most frequently used. And rightfully so, as they very often can?t handle non-trivial systems. > As people knows both methods should have non zero diagonal terms. Although the improved method is applied in PETSc, jacobi can also solve the case with zero diagonal term, but I ask because I know that it is not recommended. > > 2. Second question is about running code with the two command options below in a single process. > 1st command : -ksp_type gmres -pc_type bjacobi -sub_pc_type jacobi > 2nd command : -ksp_type gmres -pc_type hpddm -sub_pc_type jacobi > When domain decomposition methods such as bjacobi or hpddm are parallel, the global matrix is divided for each process. As far as I know, running it in a single process should eventually produce the same result if the sub pc type is the same. However, in the second option, ksp did not converge. > 1st command: it?s pointless to couple PCBJACOBI with PCJABOCI, it?s equivalent to only using PCJABOBI. 2nd command: it?s pointless to use PCHPDDM if you don?t specify in some way how to coarsen your problem (either algebraically or via an auxiliary operator). You just have a single level (equivalent to PCBJACOBI), but its options are prefixed by -pc_hpddm_coarse_ instead of -sub_ Again, both sets of options do not make sense. If you want, you could share your A and b (or tell us what you are discretizing) and we will be able to provide a better feedback. Thanks, Pierre > In this case, I wonder how to analyze the situation. > How can I monitor and see the difference between the two? > > > > > > Thanks, > > Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 16 06:09:37 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Feb 2023 07:09:37 -0500 Subject: [petsc-users] Question about preconditioner In-Reply-To: References: Message-ID: On Thu, Feb 16, 2023 at 2:43 AM user_gong Kim wrote: > > > Hello, > > > > There are some questions about some preconditioners. > > The questions are from problem Au=b. The global matrix A has zero value > diagonal terms. > > 1. Which preconditioner is preferred for matrix A which has zero > value in diagonal terms? > The most frequently used basic 2 preconditioners are jacobi and SOR (gauss > seidel). As people knows both methods should have non zero diagonal terms. > Although the improved method is applied in PETSc, jacobi can also solve the > case with zero diagonal term, but I ask because I know that it is not > recommended. > > 2. Second question is about running code with the two command options > below in a single process. > 1st command : -ksp_type gmres -pc_type bjacobi -sub_pc_type jacobi > 2nd command : -ksp_type gmres -pc_type hpddm -sub_pc_type jacobi > When domain decomposition methods such as bjacobi or hpddm are parallel, > the global matrix is divided for each process. As far as I know, running it > in a single process should eventually produce the same result if the sub pc > type is the same. However, in the second option, ksp did not converge. > In this case, I wonder how to analyze the situation. > How can I monitor and see the difference between the two? > > Pierre is correct. I will just note that the best way to use PETSc is generally to find the preconditioner you need by looking in literature for what has been successful for other people, and then reproducing it in PETSc, which should be easy. Thanks, Matt > > > Thanks, > > Hyung Kim > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From xiongziming2010 at gmail.com Thu Feb 16 08:10:43 2023 From: xiongziming2010 at gmail.com (ziming xiong) Date: Thu, 16 Feb 2023 15:10:43 +0100 Subject: [petsc-users] Question for Petsc Message-ID: Hello? I want to use Petsc to implement high performance computing, and I mainly want to apply DDM methods to parallel computing. I have implemented some of the DDM methods (such as ASM, Bjacobi, etc.), but I don't understand the PCBDDC method. The official example (src/ksp/ksp/tutorials/ex59.c.html) is too complicated, so I have not been able to figure out the setting process. I would like to ask you if you have other simple and clearer examples for reference. Secondly, I would like to apply mklPardiso to Petsc. But not work, can u help me figure out the problem? i use oneAPI for the mklpardiso, and when i configure, i give the blaslapack lib. there are the errors: [0]PETSC ERROR: See https://petsc.org/release/overview/linear_solve_table/ for possible LU and Cholesky solvers [0]PETSC ERROR: Could not locate solver type mkl_pardiso for factorization type LU and matrix type seqaij. Perhaps you must ./configure with --download-mkl_pardiso [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 [0]PETSC ERROR: C:\Users\XiongZiming\Desktop\test_petsc_FEM\test_petsc_fem\x64\Release\test_petsc_fem.exe on a arch-mswin-c-debug named lmeep-329 by XiongZiming Thu Feb 16 15:05:14 2023 [0]PETSC ERROR: Configure options --with-cc="win32fe cl" --with-fc=0 --with-cxx="win32fe cl" --with-shared-libraries=0 --with-mpi-include="[/cygdrive/c/PROGRA~2/Intel/MPI/Include,/cygdrive/c/PROGRA~2/Intel/MPI/Include/x64]" --with-mpi-lib="-L/cygdrive/c/PROGRA~2/Intel/MPI/lib/x64 msmpifec.lib msmpi.lib" --with-mpiexec=/cygdrive/c/PROGRA~1/Microsoft_MPI/Bin/mpiexec --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/Intel/oneAPI/mkl/2023.0.0/lib/intel64 mkl_intel_lp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib" Thanks, Ziming XIONG -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at mcmaster.ca Thu Feb 16 09:16:50 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Thu, 16 Feb 2023 15:16:50 +0000 Subject: [petsc-users] dmplex overlap questions Message-ID: Hi, I am trying to implement a non-local finite elements reconstruction operator in parallel. Given a dmplex distributed with an overlap, is there a way to figure out which cells are in the overlap and which are not? Alternatively, suppose that I distribute the same DM with and without an overlap. Is there any warranty that the distributions are compatible (i.e. coincide when the overlap is ignored)? If this is the case, can I assume that the non-overlap cells are numbered first in the overlapped dm? Regards, Blaise ? Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1) Professor, Department of Mathematics & Statistics Hamilton Hall room 409A, McMaster University 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243 From bsmith at petsc.dev Thu Feb 16 09:27:01 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 16 Feb 2023 10:27:01 -0500 Subject: [petsc-users] Question about preconditioner In-Reply-To: References: Message-ID: <1B26836D-E55D-4DAB-8388-EF246D83F83D@petsc.dev> If your matrix has the form ( A B ) ( C 0 ) then often PCFIELDSPLIT can be a useful preconditioner with the option -pc_fieldsplit_detect_saddle_point > On Feb 16, 2023, at 2:42 AM, user_gong Kim wrote: > > > Hello, > > > There are some questions about some preconditioners. > > The questions are from problem Au=b. The global matrix A has zero value diagonal terms. > > 1. Which preconditioner is preferred for matrix A which has zero value in diagonal terms? > The most frequently used basic 2 preconditioners are jacobi and SOR (gauss seidel). As people knows both methods should have non zero diagonal terms. Although the improved method is applied in PETSc, jacobi can also solve the case with zero diagonal term, but I ask because I know that it is not recommended. > > 2. Second question is about running code with the two command options below in a single process. > 1st command : -ksp_type gmres -pc_type bjacobi -sub_pc_type jacobi > 2nd command : -ksp_type gmres -pc_type hpddm -sub_pc_type jacobi > When domain decomposition methods such as bjacobi or hpddm are parallel, the global matrix is divided for each process. As far as I know, running it in a single process should eventually produce the same result if the sub pc type is the same. However, in the second option, ksp did not converge. > In this case, I wonder how to analyze the situation. > How can I monitor and see the difference between the two? > > > > Thanks, > > Hyung Kim > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Thu Feb 16 09:54:10 2023 From: wence at gmx.li (Lawrence Mitchell) Date: Thu, 16 Feb 2023 15:54:10 +0000 Subject: [petsc-users] dmplex overlap questions In-Reply-To: References: Message-ID: Hi Blaise, On Thu, 16 Feb 2023 at 15:17, Blaise Bourdin wrote: > > Hi, > > I am trying to implement a non-local finite elements reconstruction operator in parallel. > > Given a dmplex distributed with an overlap, is there a way to figure out which cells are in the overlap and which are not? Yes. Get the pointSF of the DM, and the cell chart DMPlexGetPointSF(dm, &sf); DMPlexGetHeightStratum(dm, 0, &cstart, &cend); Now get the graph (specifically ilocal of the sf): PetscSFGetGraph(sf, NULL, &nleaves, &ilocal, NULL); Now any value of ilocal that lies in [cstart, cend) is a cell which is not owned by this process (i.e. in the overlap). Note that ilocal can be NULL which just means it is the identity map [0, ..., nleaves), so you just intersect [cstart, cend) with [0, nleaves) in that case to find the overlap cells. But that is very unlikely to be true, so: for (PetscInt i = 0; i < nleaves; i++) { if (cstart <= ilocal[i] && ilocal[i] < cend) { // i is an overlap cell } } > Alternatively, suppose that I distribute the same DM with and without an overlap. Is there any warranty that the distributions are compatible (i.e. coincide when the overlap is ignored)? If this is the case, can I assume that the non-overlap cells are numbered first in the overlapped dm? If you do: DMPlexDistribute(dm, 0, &migrationSF, ¶lleldm); DMPlexDistributeOverlap(paralleldm, 1, &migrationSF2, &overlapdm); Then paralleldm and overlapdm will be compatible, and I think it is still the case that the overlap cells are numbered last (and contiguously). If you just do DMPlexDistribute(dm, 1, &migrationSF, &overlapdm) then you obviously don't have the non-overlapped one to compare, but it is in this case still true that the overlap cells are numbered last. Thanks, Lawrence From bourdin at mcmaster.ca Thu Feb 16 09:56:33 2023 From: bourdin at mcmaster.ca (Blaise Bourdin) Date: Thu, 16 Feb 2023 15:56:33 +0000 Subject: [petsc-users] dmplex overlap questions In-Reply-To: <30800_1676562865_31GFsOsV008384_CA+wRr2m9dn1kd-q5f43uWqMcoNuegYVDNuhtg01u=+dpdsQ-kA@mail.gmail.com> References: <30800_1676562865_31GFsOsV008384_CA+wRr2m9dn1kd-q5f43uWqMcoNuegYVDNuhtg01u=+dpdsQ-kA@mail.gmail.com> Message-ID: An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 16 10:40:56 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Feb 2023 11:40:56 -0500 Subject: [petsc-users] Question for Petsc In-Reply-To: References: Message-ID: On Thu, Feb 16, 2023 at 9:14 AM ziming xiong wrote: > Hello? > I want to use Petsc to implement high performance computing, and I mainly > want to apply DDM methods to parallel computing. I have implemented some of > the DDM methods (such as ASM, Bjacobi, etc.), but I don't understand the > PCBDDC method. The official example (src/ksp/ksp/tutorials/ex59.c.html) is > too complicated, so I have not been able to figure out the setting process. > I would like to ask you if you have other simple and clearer examples for > reference. > You could look at the paper: https://epubs.siam.org/doi/abs/10.1137/15M1025785 > Secondly, I would like to apply mklPardiso to Petsc. But not work, can u > help me figure out the problem? i use oneAPI for the mklpardiso, and when i > configure, i give the blaslapack lib. > You should reconfigure with --download-mkl_pardiso Thanks, Matt > there are the errors: > > [0]PETSC ERROR: See https://petsc.org/release/overview/linear_solve_table/ > for possible LU and Cholesky solvers > [0]PETSC ERROR: Could not locate solver type mkl_pardiso for factorization > type LU and matrix type seqaij. Perhaps you must ./configure with > --download-mkl_pardiso > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 > [0]PETSC ERROR: > C:\Users\XiongZiming\Desktop\test_petsc_FEM\test_petsc_fem\x64\Release\test_petsc_fem.exe > on a arch-mswin-c-debug named lmeep-329 by XiongZiming Thu Feb 16 15:05:14 > 2023 > [0]PETSC ERROR: Configure options --with-cc="win32fe cl" --with-fc=0 > --with-cxx="win32fe cl" --with-shared-libraries=0 > --with-mpi-include="[/cygdrive/c/PROGRA~2/Intel/MPI/Include,/cygdrive/c/PROGRA~2/Intel/MPI/Include/x64]" > --with-mpi-lib="-L/cygdrive/c/PROGRA~2/Intel/MPI/lib/x64 msmpifec.lib > msmpi.lib" --with-mpiexec=/cygdrive/c/PROGRA~1/Microsoft_MPI/Bin/mpiexec > --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/Intel/oneAPI/mkl/2023.0.0/lib/intel64 > mkl_intel_lp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib" > > Thanks, > Ziming XIONG > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 16 10:43:16 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Feb 2023 11:43:16 -0500 Subject: [petsc-users] dmplex overlap questions In-Reply-To: References: Message-ID: On Thu, Feb 16, 2023 at 10:54 AM Lawrence Mitchell wrote: > Hi Blaise, > > On Thu, 16 Feb 2023 at 15:17, Blaise Bourdin wrote: > > > > Hi, > > > > I am trying to implement a non-local finite elements reconstruction > operator in parallel. > > > > Given a dmplex distributed with an overlap, is there a way to figure out > which cells are in the overlap and which are not? > > Yes. Get the pointSF of the DM, and the cell chart > > DMPlexGetPointSF(dm, &sf); > DMPlexGetHeightStratum(dm, 0, &cstart, &cend); > > Now get the graph (specifically ilocal of the sf): > > PetscSFGetGraph(sf, NULL, &nleaves, &ilocal, NULL); > > Now any value of ilocal that lies in [cstart, cend) is a cell which is > not owned by this process (i.e. in the overlap). Note that ilocal can > be NULL which just means it is the identity map [0, ..., nleaves), so > you just intersect [cstart, cend) with [0, nleaves) in that case to > find the overlap cells. > > But that is very unlikely to be true, so: > Note that you can use PetscFindInt(nleaves, ilocal, cell, &loc); as well. I do this a lot in the library. Thanks, Matt > for (PetscInt i = 0; i < nleaves; i++) { > if (cstart <= ilocal[i] && ilocal[i] < cend) { > // i is an overlap cell > } > } > > Alternatively, suppose that I distribute the same DM with and without an > overlap. Is there any warranty that the distributions are compatible (i.e. > coincide when the overlap is ignored)? If this is the case, can I assume > that the non-overlap cells are numbered first in the overlapped dm? > > If you do: > > DMPlexDistribute(dm, 0, &migrationSF, ¶lleldm); > DMPlexDistributeOverlap(paralleldm, 1, &migrationSF2, &overlapdm); > > Then paralleldm and overlapdm will be compatible, and I think it is > still the case that the overlap cells are numbered last (and > contiguously). > > If you just do DMPlexDistribute(dm, 1, &migrationSF, &overlapdm) then > you obviously don't have the non-overlapped one to compare, but it is > in this case still true that the overlap cells are numbered last. > > Thanks, > > Lawrence > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu Feb 16 11:12:20 2023 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 16 Feb 2023 20:12:20 +0300 Subject: [petsc-users] Question for Petsc In-Reply-To: References: Message-ID: For bddc, you can also take a look at https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/tutorials/ex71.c On Thu, Feb 16, 2023, 19:41 Matthew Knepley wrote: > On Thu, Feb 16, 2023 at 9:14 AM ziming xiong > wrote: > >> Hello? >> I want to use Petsc to implement high performance computing, and I mainly >> want to apply DDM methods to parallel computing. I have implemented some of >> the DDM methods (such as ASM, Bjacobi, etc.), but I don't understand the >> PCBDDC method. The official example (src/ksp/ksp/tutorials/ex59.c.html) is >> too complicated, so I have not been able to figure out the setting process. >> I would like to ask you if you have other simple and clearer examples for >> reference. >> > > You could look at the paper: > https://epubs.siam.org/doi/abs/10.1137/15M1025785 > > >> Secondly, I would like to apply mklPardiso to Petsc. But not work, can u >> help me figure out the problem? i use oneAPI for the mklpardiso, and when i >> configure, i give the blaslapack lib. >> > > You should reconfigure with --download-mkl_pardiso > > Thanks, > > Matt > > >> there are the errors: >> >> [0]PETSC ERROR: See >> https://petsc.org/release/overview/linear_solve_table/ for possible LU >> and Cholesky solvers >> [0]PETSC ERROR: Could not locate solver type mkl_pardiso for >> factorization type LU and matrix type seqaij. Perhaps you must ./configure >> with --download-mkl_pardiso >> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 >> [0]PETSC ERROR: >> C:\Users\XiongZiming\Desktop\test_petsc_FEM\test_petsc_fem\x64\Release\test_petsc_fem.exe >> on a arch-mswin-c-debug named lmeep-329 by XiongZiming Thu Feb 16 15:05:14 >> 2023 >> [0]PETSC ERROR: Configure options --with-cc="win32fe cl" --with-fc=0 >> --with-cxx="win32fe cl" --with-shared-libraries=0 >> --with-mpi-include="[/cygdrive/c/PROGRA~2/Intel/MPI/Include,/cygdrive/c/PROGRA~2/Intel/MPI/Include/x64]" >> --with-mpi-lib="-L/cygdrive/c/PROGRA~2/Intel/MPI/lib/x64 msmpifec.lib >> msmpi.lib" --with-mpiexec=/cygdrive/c/PROGRA~1/Microsoft_MPI/Bin/mpiexec >> --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/Intel/oneAPI/mkl/2023.0.0/lib/intel64 >> mkl_intel_lp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib" >> >> Thanks, >> Ziming XIONG >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Thu Feb 16 12:09:00 2023 From: wence at gmx.li (Lawrence Mitchell) Date: Thu, 16 Feb 2023 18:09:00 +0000 Subject: [petsc-users] dmplex overlap questions In-Reply-To: References: Message-ID: On Thu, 16 Feb 2023 at 16:43, Matthew Knepley wrote: > > On Thu, Feb 16, 2023 at 10:54 AM Lawrence Mitchell wrote: >> >> Hi Blaise, >> >> On Thu, 16 Feb 2023 at 15:17, Blaise Bourdin wrote: >> > >> > Hi, >> > >> > I am trying to implement a non-local finite elements reconstruction operator in parallel. >> > >> > Given a dmplex distributed with an overlap, is there a way to figure out which cells are in the overlap and which are not? >> >> Yes. Get the pointSF of the DM, and the cell chart >> >> DMPlexGetPointSF(dm, &sf); >> DMPlexGetHeightStratum(dm, 0, &cstart, &cend); >> >> Now get the graph (specifically ilocal of the sf): >> >> PetscSFGetGraph(sf, NULL, &nleaves, &ilocal, NULL); >> >> Now any value of ilocal that lies in [cstart, cend) is a cell which is >> not owned by this process (i.e. in the overlap). Note that ilocal can >> be NULL which just means it is the identity map [0, ..., nleaves), so >> you just intersect [cstart, cend) with [0, nleaves) in that case to >> find the overlap cells. >> >> But that is very unlikely to be true, so: > > > Note that you can use > > PetscFindInt(nleaves, ilocal, cell, &loc); Modulo argument order, is it not PetscFindInt(cell, nleaves, ilocal, &loc)? I guess one should probably document that PetscSFSetGraph ensures that ilocal is sorted. Lawrence From knepley at gmail.com Thu Feb 16 16:00:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 16 Feb 2023 17:00:05 -0500 Subject: [petsc-users] dmplex overlap questions In-Reply-To: References: Message-ID: On Thu, Feb 16, 2023 at 1:09 PM Lawrence Mitchell wrote: > On Thu, 16 Feb 2023 at 16:43, Matthew Knepley wrote: > > > > On Thu, Feb 16, 2023 at 10:54 AM Lawrence Mitchell wrote: > >> > >> Hi Blaise, > >> > >> On Thu, 16 Feb 2023 at 15:17, Blaise Bourdin > wrote: > >> > > >> > Hi, > >> > > >> > I am trying to implement a non-local finite elements reconstruction > operator in parallel. > >> > > >> > Given a dmplex distributed with an overlap, is there a way to figure > out which cells are in the overlap and which are not? > >> > >> Yes. Get the pointSF of the DM, and the cell chart > >> > >> DMPlexGetPointSF(dm, &sf); > >> DMPlexGetHeightStratum(dm, 0, &cstart, &cend); > >> > >> Now get the graph (specifically ilocal of the sf): > >> > >> PetscSFGetGraph(sf, NULL, &nleaves, &ilocal, NULL); > >> > >> Now any value of ilocal that lies in [cstart, cend) is a cell which is > >> not owned by this process (i.e. in the overlap). Note that ilocal can > >> be NULL which just means it is the identity map [0, ..., nleaves), so > >> you just intersect [cstart, cend) with [0, nleaves) in that case to > >> find the overlap cells. > >> > >> But that is very unlikely to be true, so: > > > > > > Note that you can use > > > > PetscFindInt(nleaves, ilocal, cell, &loc); > > Modulo argument order, is it not PetscFindInt(cell, nleaves, ilocal, &loc)? > You are right. > I guess one should probably document that PetscSFSetGraph ensures that > ilocal is sorted. > Yes, I thought it was already there, but does not seem so. Thanks, Matt > Lawrence -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Thu Feb 16 21:00:32 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Thu, 16 Feb 2023 21:00:32 -0600 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: <87bklvwwv7.fsf@jedbrown.org> <878rgzwlaw.fsf@jedbrown.org> Message-ID: A simple reproducer is attached to the reply. It loads a 2D square mesh, then create & print out viewer to a .vtu file. If PETSc is configured with 32-bit, it has no problem, but if its the 64-bit, the file won't be opened with paraview. Since the mesh size is very small, it is obvious that there could be something inconsistent in the header format. How can I fix this problem? Thanks, Mike > On Tue 14. Feb 2023 at 21:27, Jed Brown wrote: > >> Dave May writes: >> >> > On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: >> > >> >> Can you share a reproducer? I think I recall the format requiring >> certain >> >> things to be Int32. >> > >> > >> > By default, the byte offset used with the appended data format is >> UInt32. I >> > believe that?s where the sizeof(int) is coming from. This default is >> > annoying as it limits the total size of your appended data to be < 3 GB. >> > That said, in the opening of the paraview file you can add this >> attribute >> > >> > header_type="UInt64" >> >> You mean in the header of the .vtu? > > > Yeah, within the open VTKFile tag. > Like this > < VTKFile type=?xxx?, byte_order="LittleEndian" header_type="UInt64" > > > Do you happen to have an example or pointers to docs describing this >> feature? > > > Example yes - will send it to you tomorrow. Docs? not really. Only stuff > like this > > > https://kitware.github.io/paraview-docs/latest/python/paraview.simple.XMLPStructuredGridWriter.html > > > > https://kitware.github.io/paraview-docs/v5.8.0/python/paraview.simple.XMLMultiBlockDataWriter.html > > All the writers seem to support it. > > > Can we always do this? > > > Yep! > > > It isn't mentioned in these: >> >> https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf (PDF was >> created in 2003) >> >> https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats >> > > Yes I know. I?ve tied myself in knots for years because the of the > assumption that the offset had to be an int. > > Credit for the discovery goes to Carsten Uphoff. He showed me this. > > Cheers, > Dave > > > >> > then the size of the offset is now UInt64 and now large files can be >> > finally written. >> > >> > >> > Cheers, >> > Dave >> > >> > >> > >> > >> >> >> >> Mike Michell writes: >> >> >> >> > Thanks for the note. >> >> > I understood that PETSc calculates the offsets for me through >> "boffset" >> >> > variable in plexvtu.c file. Please correct me if it is wrong. >> >> > >> >> > If plexvtu.c has a bug, it could be around "write file header" part >> in >> >> > which the boffset is also computed. Is this correct? I am not using >> >> complex >> >> > number. >> >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, >> >> > PetscInt64_FMT" in writing the header. >> >> > >> >> > Which combination of those flags is correct for 64bit indices? I am >> gonna >> >> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" >> >> > statement, but I do not know what is the correct form of the header >> flag >> >> > for 64bit indices. >> >> > >> >> > It is also confusing to me: >> >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); >> >> > How is sizeof(PetscInt) different from sizeof(int)? >> >> > >> >> > Thanks, >> >> > Mike >> >> > >> >> > >> >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell < >> mi.mike1021 at gmail.com> >> >> >> wrote: >> >> >> >> >> >>> I was trying to modify the header flags from "Int32" to "Int64", >> but >> >> the >> >> >>> problem was not resolved. Could I get any additional comments? >> >> >>> >> >> >> >> >> >> The calculated offsets are not correct I think. >> >> >> >> >> >> Matt >> >> >> >> >> >> >> >> >>> Thanks, >> >> >>> Mike >> >> >>> >> >> >>> >> >> >>>> Thanks for the comments. >> >> >>>> To be precise on the question, the entire part of the header of >> the >> >> .vtu >> >> >>>> file is attached: >> >> >>>> >> >> >>>> >> >> >>>> > >> byte_order="LittleEndian"> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> > >> NumberOfComponents="3" >> >> >>>> format="appended" offset="0" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="116932" /> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="372936" /> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="404940" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> format="appended" offset="408944" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="424948" /> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> > >> NumberOfComponents="3" >> >> >>>> format="appended" offset="463928" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="580860" /> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="836864" /> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="868868" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> format="appended" offset="872872" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="888876" /> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> Thanks, >> >> >>>> Mike >> >> >>>> >> >> >>>> >> >> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell < >> mi.mike1021 at gmail.com> >> >> >>>>> wrote: >> >> >>>>> >> >> >>>>>> Dear PETSc team, >> >> >>>>>> >> >> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to >> handle dm >> >> >>>>>> object. To print out output variable and mesh connectivity, I >> use >> >> VecView() >> >> >>>>>> by defining PetscSection on that dm and borrow a vector. The >> type >> >> of the >> >> >>>>>> viewer is set to PETSCVIEWERVTK. >> >> >>>>>> >> >> >>>>>> With 32bit indices, the above work flow has no issue. However, >> if >> >> >>>>>> PETSc is configured with 64bit indices, my output .vtu file has >> an >> >> error if >> >> >>>>>> I open the file with visualization tools, such as Paraview or >> >> Tecplot, >> >> >>>>>> saying that: >> >> >>>>>> "Cannot read cell connectivity from Cells in piece 0 because the >> >> >>>>>> "offsets" array is not monotonically increasing or starts with a >> >> value >> >> >>>>>> other than 0." >> >> >>>>>> >> >> >>>>>> If I open the .vtu file from terminal, I can see such a line: >> >> >>>>>> ... >> >> >>>>>> > NumberOfComponents="1" >> >> >>>>>> format="appended" offset="580860" /> >> >> >>>>>> ... >> >> >>>>>> >> >> >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit >> >> indices. >> >> >>>>>> Could I get recommendations that I need to check to resolve the >> >> issue? >> >> >>>>>> >> >> >>>>> >> >> >>>>> This is probably a bug. We will look at it. >> >> >>>>> >> >> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but >> sizeof(PetscInt) >> >> >>>>> is used to calculate the offset, which looks inconsistent. Can >> you >> >> take a >> >> >>>>> look? >> >> >>>>> >> >> >>>>> Thanks, >> >> >>>>> >> >> >>>>> Matt >> >> >>>>> >> >> >>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Mike >> >> >>>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> -- >> >> >>>>> What most experimenters take for granted before they begin their >> >> >>>>> experiments is infinitely more interesting than any results to >> which >> >> their >> >> >>>>> experiments lead. >> >> >>>>> -- Norbert Wiener >> >> >>>>> >> >> >>>>> https://www.cse.buffalo.edu/~knepley/ >> >> >>>>> >> >> >>>>> >> >> >>>> >> >> >> >> >> >> -- >> >> >> What most experimenters take for granted before they begin their >> >> >> experiments is infinitely more interesting than any results to which >> >> their >> >> >> experiments lead. >> >> >> -- Norbert Wiener >> >> >> >> >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Question_64bit.tar Type: application/x-tar Size: 102400 bytes Desc: not available URL: From jed at jedbrown.org Thu Feb 16 21:39:50 2023 From: jed at jedbrown.org (Jed Brown) Date: Thu, 16 Feb 2023 20:39:50 -0700 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: <87bklvwwv7.fsf@jedbrown.org> <878rgzwlaw.fsf@jedbrown.org> Message-ID: <87bkltez95.fsf@jedbrown.org> Thanks, Dave. Mike, can you test that this branch works with your large problems? I tested that .vtu works in parallel for small problems, where works = loads correctly in Paraview and VisIt. https://gitlab.com/petsc/petsc/-/merge_requests/6081 Dave May writes: > On Tue 14. Feb 2023 at 21:27, Jed Brown wrote: > >> Dave May writes: >> >> > On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: >> > >> >> Can you share a reproducer? I think I recall the format requiring >> certain >> >> things to be Int32. >> > >> > >> > By default, the byte offset used with the appended data format is >> UInt32. I >> > believe that?s where the sizeof(int) is coming from. This default is >> > annoying as it limits the total size of your appended data to be < 3 GB. >> > That said, in the opening of the paraview file you can add this attribute >> > >> > header_type="UInt64" >> >> You mean in the header of the .vtu? > > > Yeah, within the open VTKFile tag. > Like this > < VTKFile type=?xxx?, byte_order="LittleEndian" header_type="UInt64" > > > Do you happen to have an example or pointers to docs describing this >> feature? > > > Example yes - will send it to you tomorrow. Docs? not really. Only stuff > like this > > https://kitware.github.io/paraview-docs/latest/python/paraview.simple.XMLPStructuredGridWriter.html > > > https://kitware.github.io/paraview-docs/v5.8.0/python/paraview.simple.XMLMultiBlockDataWriter.html > > All the writers seem to support it. > > > Can we always do this? > > > Yep! > > > It isn't mentioned in these: >> >> https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf (PDF was >> created in 2003) >> >> https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats >> > > Yes I know. I?ve tied myself in knots for years because the of the > assumption that the offset had to be an int. > > Credit for the discovery goes to Carsten Uphoff. He showed me this. > > Cheers, > Dave > > > >> > then the size of the offset is now UInt64 and now large files can be >> > finally written. >> > >> > >> > Cheers, >> > Dave >> > >> > >> > >> > >> >> >> >> Mike Michell writes: >> >> >> >> > Thanks for the note. >> >> > I understood that PETSc calculates the offsets for me through >> "boffset" >> >> > variable in plexvtu.c file. Please correct me if it is wrong. >> >> > >> >> > If plexvtu.c has a bug, it could be around "write file header" part in >> >> > which the boffset is also computed. Is this correct? I am not using >> >> complex >> >> > number. >> >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, >> >> > PetscInt64_FMT" in writing the header. >> >> > >> >> > Which combination of those flags is correct for 64bit indices? I am >> gonna >> >> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" >> >> > statement, but I do not know what is the correct form of the header >> flag >> >> > for 64bit indices. >> >> > >> >> > It is also confusing to me: >> >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); >> >> > How is sizeof(PetscInt) different from sizeof(int)? >> >> > >> >> > Thanks, >> >> > Mike >> >> > >> >> > >> >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell > > >> >> >> wrote: >> >> >> >> >> >>> I was trying to modify the header flags from "Int32" to "Int64", but >> >> the >> >> >>> problem was not resolved. Could I get any additional comments? >> >> >>> >> >> >> >> >> >> The calculated offsets are not correct I think. >> >> >> >> >> >> Matt >> >> >> >> >> >> >> >> >>> Thanks, >> >> >>> Mike >> >> >>> >> >> >>> >> >> >>>> Thanks for the comments. >> >> >>>> To be precise on the question, the entire part of the header of the >> >> .vtu >> >> >>>> file is attached: >> >> >>>> >> >> >>>> >> >> >>>> > >> byte_order="LittleEndian"> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> > >> NumberOfComponents="3" >> >> >>>> format="appended" offset="0" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="116932" /> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="372936" /> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="404940" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> format="appended" offset="408944" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="424948" /> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> > >> NumberOfComponents="3" >> >> >>>> format="appended" offset="463928" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="580860" /> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="836864" /> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="868868" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> format="appended" offset="872872" /> >> >> >>>> >> >> >>>> >> >> >>>> > >> >>>> NumberOfComponents="1" format="appended" offset="888876" /> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> Thanks, >> >> >>>> Mike >> >> >>>> >> >> >>>> >> >> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell < >> mi.mike1021 at gmail.com> >> >> >>>>> wrote: >> >> >>>>> >> >> >>>>>> Dear PETSc team, >> >> >>>>>> >> >> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to handle >> dm >> >> >>>>>> object. To print out output variable and mesh connectivity, I use >> >> VecView() >> >> >>>>>> by defining PetscSection on that dm and borrow a vector. The type >> >> of the >> >> >>>>>> viewer is set to PETSCVIEWERVTK. >> >> >>>>>> >> >> >>>>>> With 32bit indices, the above work flow has no issue. However, if >> >> >>>>>> PETSc is configured with 64bit indices, my output .vtu file has >> an >> >> error if >> >> >>>>>> I open the file with visualization tools, such as Paraview or >> >> Tecplot, >> >> >>>>>> saying that: >> >> >>>>>> "Cannot read cell connectivity from Cells in piece 0 because the >> >> >>>>>> "offsets" array is not monotonically increasing or starts with a >> >> value >> >> >>>>>> other than 0." >> >> >>>>>> >> >> >>>>>> If I open the .vtu file from terminal, I can see such a line: >> >> >>>>>> ... >> >> >>>>>> > NumberOfComponents="1" >> >> >>>>>> format="appended" offset="580860" /> >> >> >>>>>> ... >> >> >>>>>> >> >> >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit >> >> indices. >> >> >>>>>> Could I get recommendations that I need to check to resolve the >> >> issue? >> >> >>>>>> >> >> >>>>> >> >> >>>>> This is probably a bug. We will look at it. >> >> >>>>> >> >> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but >> sizeof(PetscInt) >> >> >>>>> is used to calculate the offset, which looks inconsistent. Can you >> >> take a >> >> >>>>> look? >> >> >>>>> >> >> >>>>> Thanks, >> >> >>>>> >> >> >>>>> Matt >> >> >>>>> >> >> >>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Mike >> >> >>>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> -- >> >> >>>>> What most experimenters take for granted before they begin their >> >> >>>>> experiments is infinitely more interesting than any results to >> which >> >> their >> >> >>>>> experiments lead. >> >> >>>>> -- Norbert Wiener >> >> >>>>> >> >> >>>>> https://www.cse.buffalo.edu/~knepley/ >> >> >>>>> >> >> >>>>> >> >> >>>> >> >> >> >> >> >> -- >> >> >> What most experimenters take for granted before they begin their >> >> >> experiments is infinitely more interesting than any results to which >> >> their >> >> >> experiments lead. >> >> >> -- Norbert Wiener >> >> >> >> >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> >> >> >> >> >> From mi.mike1021 at gmail.com Thu Feb 16 22:36:20 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Thu, 16 Feb 2023 22:36:20 -0600 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: <87bkltez95.fsf@jedbrown.org> References: <87bklvwwv7.fsf@jedbrown.org> <878rgzwlaw.fsf@jedbrown.org> <87bkltez95.fsf@jedbrown.org> Message-ID: Jed, It does not work for me even with the reproducer case with the small 2D square mesh. Can you run the case that I sent and open the created "sol.vtu" file with paraview? Thanks, > Thanks, Dave. > > Mike, can you test that this branch works with your large problems? I > tested that .vtu works in parallel for small problems, where works = loads > correctly in Paraview and VisIt. > > https://gitlab.com/petsc/petsc/-/merge_requests/6081 > > Dave May writes: > > > On Tue 14. Feb 2023 at 21:27, Jed Brown wrote: > > > >> Dave May writes: > >> > >> > On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: > >> > > >> >> Can you share a reproducer? I think I recall the format requiring > >> certain > >> >> things to be Int32. > >> > > >> > > >> > By default, the byte offset used with the appended data format is > >> UInt32. I > >> > believe that?s where the sizeof(int) is coming from. This default is > >> > annoying as it limits the total size of your appended data to be < 3 > GB. > >> > That said, in the opening of the paraview file you can add this > attribute > >> > > >> > header_type="UInt64" > >> > >> You mean in the header of the .vtu? > > > > > > Yeah, within the open VTKFile tag. > > Like this > > < VTKFile type=?xxx?, byte_order="LittleEndian" header_type="UInt64" > > > > > Do you happen to have an example or pointers to docs describing this > >> feature? > > > > > > Example yes - will send it to you tomorrow. Docs? not really. Only stuff > > like this > > > > > https://kitware.github.io/paraview-docs/latest/python/paraview.simple.XMLPStructuredGridWriter.html > > > > > > > https://kitware.github.io/paraview-docs/v5.8.0/python/paraview.simple.XMLMultiBlockDataWriter.html > > > > All the writers seem to support it. > > > > > > Can we always do this? > > > > > > Yep! > > > > > > It isn't mentioned in these: > >> > >> https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf (PDF was > >> created in 2003) > >> > >> > https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats > >> > > > > Yes I know. I?ve tied myself in knots for years because the of the > > assumption that the offset had to be an int. > > > > Credit for the discovery goes to Carsten Uphoff. He showed me this. > > > > Cheers, > > Dave > > > > > > > >> > then the size of the offset is now UInt64 and now large files can be > >> > finally written. > >> > > >> > > >> > Cheers, > >> > Dave > >> > > >> > > >> > > >> > > >> >> > >> >> Mike Michell writes: > >> >> > >> >> > Thanks for the note. > >> >> > I understood that PETSc calculates the offsets for me through > >> "boffset" > >> >> > variable in plexvtu.c file. Please correct me if it is wrong. > >> >> > > >> >> > If plexvtu.c has a bug, it could be around "write file header" > part in > >> >> > which the boffset is also computed. Is this correct? I am not using > >> >> complex > >> >> > number. > >> >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, > >> >> > PetscInt64_FMT" in writing the header. > >> >> > > >> >> > Which combination of those flags is correct for 64bit indices? I am > >> gonna > >> >> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" > >> >> > statement, but I do not know what is the correct form of the header > >> flag > >> >> > for 64bit indices. > >> >> > > >> >> > It is also confusing to me: > >> >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); > >> >> > How is sizeof(PetscInt) different from sizeof(int)? > >> >> > > >> >> > Thanks, > >> >> > Mike > >> >> > > >> >> > > >> >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell < > mi.mike1021 at gmail.com > >> > > >> >> >> wrote: > >> >> >> > >> >> >>> I was trying to modify the header flags from "Int32" to "Int64", > but > >> >> the > >> >> >>> problem was not resolved. Could I get any additional comments? > >> >> >>> > >> >> >> > >> >> >> The calculated offsets are not correct I think. > >> >> >> > >> >> >> Matt > >> >> >> > >> >> >> > >> >> >>> Thanks, > >> >> >>> Mike > >> >> >>> > >> >> >>> > >> >> >>>> Thanks for the comments. > >> >> >>>> To be precise on the question, the entire part of the header of > the > >> >> .vtu > >> >> >>>> file is attached: > >> >> >>>> > >> >> >>>> > >> >> >>>> >> >> byte_order="LittleEndian"> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> >> >> NumberOfComponents="3" > >> >> >>>> format="appended" offset="0" /> > >> >> >>>> > >> >> >>>> > >> >> >>>> >> >> >>>> NumberOfComponents="1" format="appended" offset="116932" /> > >> >> >>>> >> >> >>>> NumberOfComponents="1" format="appended" offset="372936" /> > >> >> >>>> >> >> >>>> NumberOfComponents="1" format="appended" offset="404940" /> > >> >> >>>> > >> >> >>>> > >> >> >>>> NumberOfComponents="1" > >> >> >>>> format="appended" offset="408944" /> > >> >> >>>> > >> >> >>>> > >> >> >>>> Name="Vec_0x37c89c0_4Field_0.0" > >> >> >>>> NumberOfComponents="1" format="appended" offset="424948" /> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> >> >> NumberOfComponents="3" > >> >> >>>> format="appended" offset="463928" /> > >> >> >>>> > >> >> >>>> > >> >> >>>> >> >> >>>> NumberOfComponents="1" format="appended" offset="580860" /> > >> >> >>>> >> >> >>>> NumberOfComponents="1" format="appended" offset="836864" /> > >> >> >>>> >> >> >>>> NumberOfComponents="1" format="appended" offset="868868" /> > >> >> >>>> > >> >> >>>> > >> >> >>>> NumberOfComponents="1" > >> >> >>>> format="appended" offset="872872" /> > >> >> >>>> > >> >> >>>> > >> >> >>>> Name="Vec_0x37c89c0_4Field_0.0" > >> >> >>>> NumberOfComponents="1" format="appended" offset="888876" /> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> Thanks, > >> >> >>>> Mike > >> >> >>>> > >> >> >>>> > >> >> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell < > >> mi.mike1021 at gmail.com> > >> >> >>>>> wrote: > >> >> >>>>> > >> >> >>>>>> Dear PETSc team, > >> >> >>>>>> > >> >> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to > handle > >> dm > >> >> >>>>>> object. To print out output variable and mesh connectivity, I > use > >> >> VecView() > >> >> >>>>>> by defining PetscSection on that dm and borrow a vector. The > type > >> >> of the > >> >> >>>>>> viewer is set to PETSCVIEWERVTK. > >> >> >>>>>> > >> >> >>>>>> With 32bit indices, the above work flow has no issue. > However, if > >> >> >>>>>> PETSc is configured with 64bit indices, my output .vtu file > has > >> an > >> >> error if > >> >> >>>>>> I open the file with visualization tools, such as Paraview or > >> >> Tecplot, > >> >> >>>>>> saying that: > >> >> >>>>>> "Cannot read cell connectivity from Cells in piece 0 because > the > >> >> >>>>>> "offsets" array is not monotonically increasing or starts > with a > >> >> value > >> >> >>>>>> other than 0." > >> >> >>>>>> > >> >> >>>>>> If I open the .vtu file from terminal, I can see such a line: > >> >> >>>>>> ... > >> >> >>>>>> >> NumberOfComponents="1" > >> >> >>>>>> format="appended" offset="580860" /> > >> >> >>>>>> ... > >> >> >>>>>> > >> >> >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit > >> >> indices. > >> >> >>>>>> Could I get recommendations that I need to check to resolve > the > >> >> issue? > >> >> >>>>>> > >> >> >>>>> > >> >> >>>>> This is probably a bug. We will look at it. > >> >> >>>>> > >> >> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but > >> sizeof(PetscInt) > >> >> >>>>> is used to calculate the offset, which looks inconsistent. Can > you > >> >> take a > >> >> >>>>> look? > >> >> >>>>> > >> >> >>>>> Thanks, > >> >> >>>>> > >> >> >>>>> Matt > >> >> >>>>> > >> >> >>>>> > >> >> >>>>>> Thanks, > >> >> >>>>>> Mike > >> >> >>>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> -- > >> >> >>>>> What most experimenters take for granted before they begin > their > >> >> >>>>> experiments is infinitely more interesting than any results to > >> which > >> >> their > >> >> >>>>> experiments lead. > >> >> >>>>> -- Norbert Wiener > >> >> >>>>> > >> >> >>>>> https://www.cse.buffalo.edu/~knepley/ > >> >> >>>>> > >> >> >>>>> > >> >> >>>> > >> >> >> > >> >> >> -- > >> >> >> What most experimenters take for granted before they begin their > >> >> >> experiments is infinitely more interesting than any results to > which > >> >> their > >> >> >> experiments lead. > >> >> >> -- Norbert Wiener > >> >> >> > >> >> >> https://www.cse.buffalo.edu/~knepley/ > >> >> >> > >> >> >> > >> >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Feb 16 23:39:30 2023 From: jed at jedbrown.org (Jed Brown) Date: Thu, 16 Feb 2023 22:39:30 -0700 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: References: <87bklvwwv7.fsf@jedbrown.org> <878rgzwlaw.fsf@jedbrown.org> <87bkltez95.fsf@jedbrown.org> Message-ID: <873574g8a5.fsf@jedbrown.org> Okay, this works now. I'm pretty sure I tested this long ago with connectivity using Int64 and found that didn't work. That may have been ancient history, but I'm hesitant to revamp to match PetscInt. If doing that, it would require changing the signature of DMPlexGetVTKConnectivity to use PetscInt instead of PetscVTKInt. I'm already underwater and don't have the stamina to test it, but this MR will get you going for problems in which individual parts don't have more than 2B dofs. https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 Mike Michell writes: > Jed, > > It does not work for me even with the reproducer case with the small 2D > square mesh. Can you run the case that I sent and open the created > "sol.vtu" file with paraview? > > Thanks, > > >> Thanks, Dave. >> >> Mike, can you test that this branch works with your large problems? I >> tested that .vtu works in parallel for small problems, where works = loads >> correctly in Paraview and VisIt. >> >> https://gitlab.com/petsc/petsc/-/merge_requests/6081 >> >> Dave May writes: >> >> > On Tue 14. Feb 2023 at 21:27, Jed Brown wrote: >> > >> >> Dave May writes: >> >> >> >> > On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: >> >> > >> >> >> Can you share a reproducer? I think I recall the format requiring >> >> certain >> >> >> things to be Int32. >> >> > >> >> > >> >> > By default, the byte offset used with the appended data format is >> >> UInt32. I >> >> > believe that?s where the sizeof(int) is coming from. This default is >> >> > annoying as it limits the total size of your appended data to be < 3 >> GB. >> >> > That said, in the opening of the paraview file you can add this >> attribute >> >> > >> >> > header_type="UInt64" >> >> >> >> You mean in the header of the .vtu? >> > >> > >> > Yeah, within the open VTKFile tag. >> > Like this >> > < VTKFile type=?xxx?, byte_order="LittleEndian" header_type="UInt64" > >> > >> > Do you happen to have an example or pointers to docs describing this >> >> feature? >> > >> > >> > Example yes - will send it to you tomorrow. Docs? not really. Only stuff >> > like this >> > >> > >> https://kitware.github.io/paraview-docs/latest/python/paraview.simple.XMLPStructuredGridWriter.html >> > >> > >> > >> https://kitware.github.io/paraview-docs/v5.8.0/python/paraview.simple.XMLMultiBlockDataWriter.html >> > >> > All the writers seem to support it. >> > >> > >> > Can we always do this? >> > >> > >> > Yep! >> > >> > >> > It isn't mentioned in these: >> >> >> >> https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf (PDF was >> >> created in 2003) >> >> >> >> >> https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats >> >> >> > >> > Yes I know. I?ve tied myself in knots for years because the of the >> > assumption that the offset had to be an int. >> > >> > Credit for the discovery goes to Carsten Uphoff. He showed me this. >> > >> > Cheers, >> > Dave >> > >> > >> > >> >> > then the size of the offset is now UInt64 and now large files can be >> >> > finally written. >> >> > >> >> > >> >> > Cheers, >> >> > Dave >> >> > >> >> > >> >> > >> >> > >> >> >> >> >> >> Mike Michell writes: >> >> >> >> >> >> > Thanks for the note. >> >> >> > I understood that PETSc calculates the offsets for me through >> >> "boffset" >> >> >> > variable in plexvtu.c file. Please correct me if it is wrong. >> >> >> > >> >> >> > If plexvtu.c has a bug, it could be around "write file header" >> part in >> >> >> > which the boffset is also computed. Is this correct? I am not using >> >> >> complex >> >> >> > number. >> >> >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, >> >> >> > PetscInt64_FMT" in writing the header. >> >> >> > >> >> >> > Which combination of those flags is correct for 64bit indices? I am >> >> gonna >> >> >> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)" >> >> >> > statement, but I do not know what is the correct form of the header >> >> flag >> >> >> > for 64bit indices. >> >> >> > >> >> >> > It is also confusing to me: >> >> >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); >> >> >> > How is sizeof(PetscInt) different from sizeof(int)? >> >> >> > >> >> >> > Thanks, >> >> >> > Mike >> >> >> > >> >> >> > >> >> >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell < >> mi.mike1021 at gmail.com >> >> > >> >> >> >> wrote: >> >> >> >> >> >> >> >>> I was trying to modify the header flags from "Int32" to "Int64", >> but >> >> >> the >> >> >> >>> problem was not resolved. Could I get any additional comments? >> >> >> >>> >> >> >> >> >> >> >> >> The calculated offsets are not correct I think. >> >> >> >> >> >> >> >> Matt >> >> >> >> >> >> >> >> >> >> >> >>> Thanks, >> >> >> >>> Mike >> >> >> >>> >> >> >> >>> >> >> >> >>>> Thanks for the comments. >> >> >> >>>> To be precise on the question, the entire part of the header of >> the >> >> >> .vtu >> >> >> >>>> file is attached: >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > >> >> byte_order="LittleEndian"> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > >> >> NumberOfComponents="3" >> >> >> >>>> format="appended" offset="0" /> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > >> >> >>>> NumberOfComponents="1" format="appended" offset="116932" /> >> >> >> >>>> > >> >> >>>> NumberOfComponents="1" format="appended" offset="372936" /> >> >> >> >>>> > >> >> >>>> NumberOfComponents="1" format="appended" offset="404940" /> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > NumberOfComponents="1" >> >> >> >>>> format="appended" offset="408944" /> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > Name="Vec_0x37c89c0_4Field_0.0" >> >> >> >>>> NumberOfComponents="1" format="appended" offset="424948" /> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > >> >> NumberOfComponents="3" >> >> >> >>>> format="appended" offset="463928" /> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > >> >> >>>> NumberOfComponents="1" format="appended" offset="580860" /> >> >> >> >>>> > >> >> >>>> NumberOfComponents="1" format="appended" offset="836864" /> >> >> >> >>>> > >> >> >>>> NumberOfComponents="1" format="appended" offset="868868" /> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > NumberOfComponents="1" >> >> >> >>>> format="appended" offset="872872" /> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> > Name="Vec_0x37c89c0_4Field_0.0" >> >> >> >>>> NumberOfComponents="1" format="appended" offset="888876" /> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> Thanks, >> >> >> >>>> Mike >> >> >> >>>> >> >> >> >>>> >> >> >> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell < >> >> mi.mike1021 at gmail.com> >> >> >> >>>>> wrote: >> >> >> >>>>> >> >> >> >>>>>> Dear PETSc team, >> >> >> >>>>>> >> >> >> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to >> handle >> >> dm >> >> >> >>>>>> object. To print out output variable and mesh connectivity, I >> use >> >> >> VecView() >> >> >> >>>>>> by defining PetscSection on that dm and borrow a vector. The >> type >> >> >> of the >> >> >> >>>>>> viewer is set to PETSCVIEWERVTK. >> >> >> >>>>>> >> >> >> >>>>>> With 32bit indices, the above work flow has no issue. >> However, if >> >> >> >>>>>> PETSc is configured with 64bit indices, my output .vtu file >> has >> >> an >> >> >> error if >> >> >> >>>>>> I open the file with visualization tools, such as Paraview or >> >> >> Tecplot, >> >> >> >>>>>> saying that: >> >> >> >>>>>> "Cannot read cell connectivity from Cells in piece 0 because >> the >> >> >> >>>>>> "offsets" array is not monotonically increasing or starts >> with a >> >> >> value >> >> >> >>>>>> other than 0." >> >> >> >>>>>> >> >> >> >>>>>> If I open the .vtu file from terminal, I can see such a line: >> >> >> >>>>>> ... >> >> >> >>>>>> > >> NumberOfComponents="1" >> >> >> >>>>>> format="appended" offset="580860" /> >> >> >> >>>>>> ... >> >> >> >>>>>> >> >> >> >>>>>> I expected "DataArray type="Int64", since the PETSc has 64bit >> >> >> indices. >> >> >> >>>>>> Could I get recommendations that I need to check to resolve >> the >> >> >> issue? >> >> >> >>>>>> >> >> >> >>>>> >> >> >> >>>>> This is probably a bug. We will look at it. >> >> >> >>>>> >> >> >> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but >> >> sizeof(PetscInt) >> >> >> >>>>> is used to calculate the offset, which looks inconsistent. Can >> you >> >> >> take a >> >> >> >>>>> look? >> >> >> >>>>> >> >> >> >>>>> Thanks, >> >> >> >>>>> >> >> >> >>>>> Matt >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>>> Thanks, >> >> >> >>>>>> Mike >> >> >> >>>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> -- >> >> >> >>>>> What most experimenters take for granted before they begin >> their >> >> >> >>>>> experiments is infinitely more interesting than any results to >> >> which >> >> >> their >> >> >> >>>>> experiments lead. >> >> >> >>>>> -- Norbert Wiener >> >> >> >>>>> >> >> >> >>>>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>> >> >> >> >> >> >> >> >> -- >> >> >> >> What most experimenters take for granted before they begin their >> >> >> >> experiments is infinitely more interesting than any results to >> which >> >> >> their >> >> >> >> experiments lead. >> >> >> >> -- Norbert Wiener >> >> >> >> >> >> >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> >> >> >> >> >> >> >> >> >> >> From ksi2443 at gmail.com Fri Feb 17 01:43:02 2023 From: ksi2443 at gmail.com (user_gong Kim) Date: Fri, 17 Feb 2023 16:43:02 +0900 Subject: [petsc-users] Question about rank of matrix Message-ID: Hello, I have a question about rank of matrix. At the problem Au = b, In my case, sometimes global matrix A is not full rank. In this case, the global matrix A is more likely to be singular, and if it becomes singular, the problem cannot be solved even in the case of the direct solver. I haven't solved the problem with an iterative solver yet, but I would like to ask someone who has experienced this kind of problem. 1. If it is not full rank, is there a numerical technique to solve it by catching rows and columns with empty ranks in advance? 2.If anyone has solved it in a different way than the above numerical analysis method, please tell me your experience. Thanks, Hyung Kim -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri Feb 17 01:56:46 2023 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 17 Feb 2023 10:56:46 +0300 Subject: [petsc-users] Question about rank of matrix In-Reply-To: References: Message-ID: On Fri, Feb 17, 2023, 10:43 user_gong Kim wrote: > Hello, > > I have a question about rank of matrix. > At the problem > Au = b, > > In my case, sometimes global matrix A is not full rank. > In this case, the global matrix A is more likely to be singular, and if it > becomes singular, the problem cannot be solved even in the case of the > direct solver. > I haven't solved the problem with an iterative solver yet, but I would > like to ask someone who has experienced this kind of problem. > > 1. If it is not full rank, is there a numerical technique to solve it by > catching rows and columns with empty ranks in advance? > > 2.If anyone has solved it in a different way than the above numerical > analysis method, please tell me your experience. > > Thanks, > Hyung Kim > My experience with this is usually associated to reading a book and find the solution I'm looking for. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Fri Feb 17 02:03:50 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 17 Feb 2023 09:03:50 +0100 Subject: [petsc-users] Question about rank of matrix In-Reply-To: References: Message-ID: <2B2EB55E-A2DC-4C46-A20C-D6CBCC64EDCD@joliv.et> > On 17 Feb 2023, at 8:56 AM, Stefano Zampini wrote: > > > On Fri, Feb 17, 2023, 10:43 user_gong Kim > wrote: >> Hello, >> >> I have a question about rank of matrix. >> At the problem >> Au = b, >> >> In my case, sometimes global matrix A is not full rank. >> In this case, the global matrix A is more likely to be singular, and if it becomes singular, the problem cannot be solved even in the case of the direct solver. >> I haven't solved the problem with an iterative solver yet, but I would like to ask someone who has experienced this kind of problem. >> >> 1. If it is not full rank, is there a numerical technique to solve it by catching rows and columns with empty ranks in advance? >> >> 2.If anyone has solved it in a different way than the above numerical analysis method, please tell me your experience. >> >> Thanks, >> Hyung Kim > > > My experience with this is usually associated to reading a book and find the solution I'm looking for. On top of that, some exact factorization packages can solve singular systems, unlike what you are stating. E.g., MUMPS, together with the option -mat_mumps_icntl_24, see https://mumps-solver.org/doc/userguide_5.5.1.pdf Thanks, Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 17 04:40:19 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Feb 2023 05:40:19 -0500 Subject: [petsc-users] Question about rank of matrix In-Reply-To: References: Message-ID: On Fri, Feb 17, 2023 at 2:43 AM user_gong Kim wrote: > Hello, > > I have a question about rank of matrix. > At the problem > Au = b, > > In my case, sometimes global matrix A is not full rank. > In this case, the global matrix A is more likely to be singular, and if it > becomes singular, the problem cannot be solved even in the case of the > direct solver. > I haven't solved the problem with an iterative solver yet, but I would > like to ask someone who has experienced this kind of problem. > > 1. If it is not full rank, is there a numerical technique to solve it by > catching rows and columns with empty ranks in advance? > > 2.If anyone has solved it in a different way than the above numerical > analysis method, please tell me your experience. > As Pierre points out, MUMPS can solve singular systems. If you have an explicit characterization of the null space, then many iterative methods can also solve it by projecting out the null space. You call MatSetNullSpace() on the system matrix. Thanks, Matt > Thanks, > Hyung Kim > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 17 05:35:34 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Feb 2023 06:35:34 -0500 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: On Tue, Feb 14, 2023 at 1:43 PM Edoardo alinovi wrote: > Hi Matt, > > So I have done some research these days and I have found out that I might > try to assemble the SIMPLE for Schur approximation (myS = A11 - A10 > inv(DIAGFORM(A00)) A01). > I show this on some slides. You can do it with options since the inverse diagonal is just Preonly/Jacobi. It is not a great preconditioner, at least for Stokes. > Reading papers around, I come up with a doubt, which I believe to be a > very silly one but worth asking... > > Is the way the unknowns are packed in the matrix relevant for schur > preconditioning? > No, we pull out the matrices before running the algorithm, so you can use any numbering you want as long as you use it when you call SetIS(). Thanks, Matt > I was studying a bit ex70.c, there the block matrix is defined like: > > A = [A00 A10 > A10 A11] > Where A00 is the momentum equation matrix, A11 is the pressure equation > matrix, while A01 and A10 are the matrices for the coupling terms (i.e. > pressure gradient and continuity). The unknowns are x = [u1..uN v1...vN > w1...wN p1...pN]^T > > In my case, I assemble the matrix cell by cell (FV method), and the result > will be this one: > > [image: image.png] > > Then I split the fields giving index 0-1 for u and 2 for p. I guess Petsc > is already doing the correct handling picking up the *a^33s* to assemble > A11, but worth being 100% sure :) > > Thank you! > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 28154 bytes Desc: not available URL: From edoardo.alinovi at gmail.com Fri Feb 17 06:09:04 2023 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Fri, 17 Feb 2023 13:09:04 +0100 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: Thanks Matt, That was my thinking. So If I am going to build my own Schur approximation matrix, I guess I can forget about the numbering issue, correct? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 17 06:46:14 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Feb 2023 07:46:14 -0500 Subject: [petsc-users] Help with fieldsplit performance In-Reply-To: References: Message-ID: On Fri, Feb 17, 2023 at 7:09 AM Edoardo alinovi wrote: > Thanks Matt, That was my thinking. > > So If I am going to build my own Schur approximation matrix, I guess I can > forget about the numbering issue, correct? > Yes, you just match the numbering you are using. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Fri Feb 17 11:09:18 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Fri, 17 Feb 2023 17:09:18 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected Message-ID: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Feb 17 11:52:39 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Feb 2023 12:52:39 -0500 Subject: [petsc-users] Question about rank of matrix In-Reply-To: References: Message-ID: <45777F64-36D1-4C20-9247-BA03B007C03E@petsc.dev> > On Feb 17, 2023, at 2:43 AM, user_gong Kim wrote: > > Hello, > > I have a question about rank of matrix. > At the problem > Au = b, > > In my case, sometimes global matrix A is not full rank. > In this case, the global matrix A is more likely to be singular, and if it becomes singular, the problem cannot be solved even in the case of the direct solver. > I haven't solved the problem with an iterative solver yet, but I would like to ask someone who has experienced this kind of problem. > > 1. If it is not full rank, is there a numerical technique to solve it by catching rows and columns with empty ranks in advance? Matrices with completely empty (zero) rows and their corresponding columns are a simple type of singularity (so long as the corresponding right hand side values are zero) that are not difficult to solve. Essentially iterative solvers never "see" the empty rows and columns and successfully solve the linear systems. You can also use PCREDISTRIBUTE (even on one MPI rank) in the main git branch of PETSc to automatically extract the nontrivial rows and columns from the matrix and solve that smaller system with any solver including direct solvers. > > 2.If anyone has solved it in a different way than the above numerical analysis method, please tell me your experience. > > Thanks, > Hyung Kim > > From bsmith at petsc.dev Fri Feb 17 11:58:02 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Feb 2023 12:58:02 -0500 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: Message-ID: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory > On Feb 17, 2023, at 12:09 PM, Tu, Jiannan wrote: > > My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." > > Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. > > Could you help me to identify what is going wrong? > > Thank you very much! > > Jiannan > > --------------------------------------------------------------------------------------------------- > Run command with options > > mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ > -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ > -snes_converged_reason > > The output message is > > Start time advancing ... > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 1.673091274668e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 8.715428433630e-02 > 1 SNES Function norm 4.995727626692e-04 > 2 SNES Function norm 5.498018152230e-08 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 3.237461568254e-01 > 1 SNES Function norm 7.988531005091e-04 > 2 SNES Function norm 1.280948196292e-07 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 4.881903203545e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 7.562592690785e-02 > 1 SNES Function norm 1.143078818923e-04 > 2 SNES Function norm 9.834547907735e-09 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 2.683968949758e-01 > 1 SNES Function norm 1.838028436639e-04 > 2 SNES Function norm 9.470813523140e-09 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 1.821562431175e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 1.005443458812e-01 > 1 SNES Function norm 3.633336946661e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 1.515368382715e-01 > 1 SNES Function norm 3.389298316830e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 4.541003359206e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 1.713800906043e-01 > 1 SNES Function norm 1.179958172167e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.020265094117e-01 > 1 SNES Function norm 1.513971290464e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 6.090269704320e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.136603895703e-01 > 1 SNES Function norm 1.877474016012e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.127812462507e-01 > 1 SNES Function norm 2.713146825704e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 2.793512213059e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.205196267430e-01 > 1 SNES Function norm 2.572653773308e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.260057361977e-01 > 1 SNES Function norm 2.705816087598e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 2.764855860446e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.212505522844e-01 > 1 SNES Function norm 2.958996472386e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.273222034162e-01 > 1 SNES Function norm 2.994512887620e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 3.317240589134e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.213246532918e-01 > 1 SNES Function norm 2.799468604767e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.274570888397e-01 > 1 SNES Function norm 3.066048050994e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.274473072189e+03 > 1 SNES Function norm 2.653507278572e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.213869585841e-01 > 1 SNES Function norm 2.177156902895e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.275136370365e-01 > 1 SNES Function norm 1.962849131557e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.274473072218e+03 > 1 SNES Function norm 5.664907315679e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.223208399368e-01 > 1 SNES Function norm 5.688863091415e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.287121218919e-01 > 1 SNES Function norm 4.085338521320e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.274473071968e+03 > 1 SNES Function norm 4.694691905235e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.211786508657e-01 > 1 SNES Function norm 1.503497433939e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.272667798977e-01 > 1 SNES Function norm 2.176132327279e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 > [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 > [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 > [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Fri Feb 17 13:13:30 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Fri, 17 Feb 2023 13:13:30 -0600 Subject: [petsc-users] PetscViewer with 64bit In-Reply-To: <873574g8a5.fsf@jedbrown.org> References: <87bklvwwv7.fsf@jedbrown.org> <878rgzwlaw.fsf@jedbrown.org> <87bkltez95.fsf@jedbrown.org> <873574g8a5.fsf@jedbrown.org> Message-ID: It works as I wanted. Thank you for the change. Mike > Okay, this works now. I'm pretty sure I tested this long ago with > connectivity using Int64 and found that didn't work. That may have been > ancient history, but I'm hesitant to revamp to match PetscInt. If doing > that, it would require changing the signature of DMPlexGetVTKConnectivity > to use PetscInt instead of PetscVTKInt. I'm already underwater and don't > have the stamina to test it, but this MR will get you going for problems in > which individual parts don't have more than 2B dofs. > > > https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 > > Mike Michell writes: > > > Jed, > > > > It does not work for me even with the reproducer case with the small 2D > > square mesh. Can you run the case that I sent and open the created > > "sol.vtu" file with paraview? > > > > Thanks, > > > > > >> Thanks, Dave. > >> > >> Mike, can you test that this branch works with your large problems? I > >> tested that .vtu works in parallel for small problems, where works = > loads > >> correctly in Paraview and VisIt. > >> > >> https://gitlab.com/petsc/petsc/-/merge_requests/6081 > >> > >> Dave May writes: > >> > >> > On Tue 14. Feb 2023 at 21:27, Jed Brown wrote: > >> > > >> >> Dave May writes: > >> >> > >> >> > On Tue 14. Feb 2023 at 17:17, Jed Brown wrote: > >> >> > > >> >> >> Can you share a reproducer? I think I recall the format requiring > >> >> certain > >> >> >> things to be Int32. > >> >> > > >> >> > > >> >> > By default, the byte offset used with the appended data format is > >> >> UInt32. I > >> >> > believe that?s where the sizeof(int) is coming from. This default > is > >> >> > annoying as it limits the total size of your appended data to be < > 3 > >> GB. > >> >> > That said, in the opening of the paraview file you can add this > >> attribute > >> >> > > >> >> > header_type="UInt64" > >> >> > >> >> You mean in the header of the .vtu? > >> > > >> > > >> > Yeah, within the open VTKFile tag. > >> > Like this > >> > < VTKFile type=?xxx?, byte_order="LittleEndian" header_type="UInt64" > > > >> > > >> > Do you happen to have an example or pointers to docs describing this > >> >> feature? > >> > > >> > > >> > Example yes - will send it to you tomorrow. Docs? not really. Only > stuff > >> > like this > >> > > >> > > >> > https://kitware.github.io/paraview-docs/latest/python/paraview.simple.XMLPStructuredGridWriter.html > >> > > >> > > >> > > >> > https://kitware.github.io/paraview-docs/v5.8.0/python/paraview.simple.XMLMultiBlockDataWriter.html > >> > > >> > All the writers seem to support it. > >> > > >> > > >> > Can we always do this? > >> > > >> > > >> > Yep! > >> > > >> > > >> > It isn't mentioned in these: > >> >> > >> >> https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf (PDF > was > >> >> created in 2003) > >> >> > >> >> > >> > https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats > >> >> > >> > > >> > Yes I know. I?ve tied myself in knots for years because the of the > >> > assumption that the offset had to be an int. > >> > > >> > Credit for the discovery goes to Carsten Uphoff. He showed me this. > >> > > >> > Cheers, > >> > Dave > >> > > >> > > >> > > >> >> > then the size of the offset is now UInt64 and now large files can > be > >> >> > finally written. > >> >> > > >> >> > > >> >> > Cheers, > >> >> > Dave > >> >> > > >> >> > > >> >> > > >> >> > > >> >> >> > >> >> >> Mike Michell writes: > >> >> >> > >> >> >> > Thanks for the note. > >> >> >> > I understood that PETSc calculates the offsets for me through > >> >> "boffset" > >> >> >> > variable in plexvtu.c file. Please correct me if it is wrong. > >> >> >> > > >> >> >> > If plexvtu.c has a bug, it could be around "write file header" > >> part in > >> >> >> > which the boffset is also computed. Is this correct? I am not > using > >> >> >> complex > >> >> >> > number. > >> >> >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT, > >> >> >> > PetscInt64_FMT" in writing the header. > >> >> >> > > >> >> >> > Which combination of those flags is correct for 64bit indices? > I am > >> >> gonna > >> >> >> > modify plexvtu.c file with "#if > defined(PETSC_USE_64BIT_INDICES)" > >> >> >> > statement, but I do not know what is the correct form of the > header > >> >> flag > >> >> >> > for 64bit indices. > >> >> >> > > >> >> >> > It is also confusing to me: > >> >> >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int); > >> >> >> > How is sizeof(PetscInt) different from sizeof(int)? > >> >> >> > > >> >> >> > Thanks, > >> >> >> > Mike > >> >> >> > > >> >> >> > > >> >> >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell < > >> mi.mike1021 at gmail.com > >> >> > > >> >> >> >> wrote: > >> >> >> >> > >> >> >> >>> I was trying to modify the header flags from "Int32" to > "Int64", > >> but > >> >> >> the > >> >> >> >>> problem was not resolved. Could I get any additional comments? > >> >> >> >>> > >> >> >> >> > >> >> >> >> The calculated offsets are not correct I think. > >> >> >> >> > >> >> >> >> Matt > >> >> >> >> > >> >> >> >> > >> >> >> >>> Thanks, > >> >> >> >>> Mike > >> >> >> >>> > >> >> >> >>> > >> >> >> >>>> Thanks for the comments. > >> >> >> >>>> To be precise on the question, the entire part of the header > of > >> the > >> >> >> .vtu > >> >> >> >>>> file is attached: > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> >> >> byte_order="LittleEndian"> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> >> >> NumberOfComponents="3" > >> >> >> >>>> format="appended" offset="0" /> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> >> >> >>>> NumberOfComponents="1" format="appended" offset="116932" /> > >> >> >> >>>> >> >> >> >>>> NumberOfComponents="1" format="appended" offset="372936" /> > >> >> >> >>>> >> >> >> >>>> NumberOfComponents="1" format="appended" offset="404940" /> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> NumberOfComponents="1" > >> >> >> >>>> format="appended" offset="408944" /> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> Name="Vec_0x37c89c0_4Field_0.0" > >> >> >> >>>> NumberOfComponents="1" format="appended" offset="424948" /> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> >> >> NumberOfComponents="3" > >> >> >> >>>> format="appended" offset="463928" /> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> >> >> >>>> NumberOfComponents="1" format="appended" offset="580860" /> > >> >> >> >>>> >> >> >> >>>> NumberOfComponents="1" format="appended" offset="836864" /> > >> >> >> >>>> >> >> >> >>>> NumberOfComponents="1" format="appended" offset="868868" /> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> NumberOfComponents="1" > >> >> >> >>>> format="appended" offset="872872" /> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> >> Name="Vec_0x37c89c0_4Field_0.0" > >> >> >> >>>> NumberOfComponents="1" format="appended" offset="888876" /> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> Thanks, > >> >> >> >>>> Mike > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>>> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell < > >> >> mi.mike1021 at gmail.com> > >> >> >> >>>>> wrote: > >> >> >> >>>>> > >> >> >> >>>>>> Dear PETSc team, > >> >> >> >>>>>> > >> >> >> >>>>>> I am a user of PETSc with Fortran. My code uses DMPlex to > >> handle > >> >> dm > >> >> >> >>>>>> object. To print out output variable and mesh > connectivity, I > >> use > >> >> >> VecView() > >> >> >> >>>>>> by defining PetscSection on that dm and borrow a vector. > The > >> type > >> >> >> of the > >> >> >> >>>>>> viewer is set to PETSCVIEWERVTK. > >> >> >> >>>>>> > >> >> >> >>>>>> With 32bit indices, the above work flow has no issue. > >> However, if > >> >> >> >>>>>> PETSc is configured with 64bit indices, my output .vtu file > >> has > >> >> an > >> >> >> error if > >> >> >> >>>>>> I open the file with visualization tools, such as Paraview > or > >> >> >> Tecplot, > >> >> >> >>>>>> saying that: > >> >> >> >>>>>> "Cannot read cell connectivity from Cells in piece 0 > because > >> the > >> >> >> >>>>>> "offsets" array is not monotonically increasing or starts > >> with a > >> >> >> value > >> >> >> >>>>>> other than 0." > >> >> >> >>>>>> > >> >> >> >>>>>> If I open the .vtu file from terminal, I can see such a > line: > >> >> >> >>>>>> ... > >> >> >> >>>>>> >> >> NumberOfComponents="1" > >> >> >> >>>>>> format="appended" offset="580860" /> > >> >> >> >>>>>> ... > >> >> >> >>>>>> > >> >> >> >>>>>> I expected "DataArray type="Int64", since the PETSc has > 64bit > >> >> >> indices. > >> >> >> >>>>>> Could I get recommendations that I need to check to resolve > >> the > >> >> >> issue? > >> >> >> >>>>>> > >> >> >> >>>>> > >> >> >> >>>>> This is probably a bug. We will look at it. > >> >> >> >>>>> > >> >> >> >>>>> Jed, I saw that Int32 is hardcoded in plexvtu.c, but > >> >> sizeof(PetscInt) > >> >> >> >>>>> is used to calculate the offset, which looks inconsistent. > Can > >> you > >> >> >> take a > >> >> >> >>>>> look? > >> >> >> >>>>> > >> >> >> >>>>> Thanks, > >> >> >> >>>>> > >> >> >> >>>>> Matt > >> >> >> >>>>> > >> >> >> >>>>> > >> >> >> >>>>>> Thanks, > >> >> >> >>>>>> Mike > >> >> >> >>>>>> > >> >> >> >>>>> > >> >> >> >>>>> > >> >> >> >>>>> -- > >> >> >> >>>>> What most experimenters take for granted before they begin > >> their > >> >> >> >>>>> experiments is infinitely more interesting than any results > to > >> >> which > >> >> >> their > >> >> >> >>>>> experiments lead. > >> >> >> >>>>> -- Norbert Wiener > >> >> >> >>>>> > >> >> >> >>>>> https://www.cse.buffalo.edu/~knepley/ > >> >> >> >>>>> > >> >> >> >>>>> > >> >> >> >>>> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> What most experimenters take for granted before they begin > their > >> >> >> >> experiments is infinitely more interesting than any results to > >> which > >> >> >> their > >> >> >> >> experiments lead. > >> >> >> >> -- Norbert Wiener > >> >> >> >> > >> >> >> >> https://www.cse.buffalo.edu/~knepley/ > >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Fri Feb 17 14:00:31 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Fri, 17 Feb 2023 20:00:31 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> Message-ID: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 17 14:15:16 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Feb 2023 15:15:16 -0500 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> Message-ID: I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan wrote: > These are what I got with the options you suggested. > > > > Thank you, > > Jiannan > > > > > ------------------------------------------------------------------------------- > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 1.673091274668e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 8.715428433630e-02 > > 1 SNES Function norm 4.995727626692e-04 > > 2 SNES Function norm 5.498018152230e-08 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 3.237461568254e-01 > > 1 SNES Function norm 7.988531005091e-04 > > 2 SNES Function norm 1.280948196292e-07 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 > dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 4.881903203545e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 7.562592690785e-02 > > 1 SNES Function norm 1.143078818923e-04 > > 2 SNES Function norm 9.834547907735e-09 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 2.683968949758e-01 > > 1 SNES Function norm 1.838028436639e-04 > > 2 SNES Function norm 9.470813523140e-09 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 > dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 1.821562431175e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 1.005443458812e-01 > > 1 SNES Function norm 3.633336946661e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 1.515368382715e-01 > > 1 SNES Function norm 3.389298316830e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 > dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 4.541003359206e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 1.713800906043e-01 > > 1 SNES Function norm 1.179958172167e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.020265094117e-01 > > 1 SNES Function norm 1.513971290464e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 > dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 6.090269704320e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.136603895703e-01 > > 1 SNES Function norm 1.877474016012e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.127812462507e-01 > > 1 SNES Function norm 2.713146825704e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 > dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 2.793512213059e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.205196267430e-01 > > 1 SNES Function norm 2.572653773308e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.260057361977e-01 > > 1 SNES Function norm 2.705816087598e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 > dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 2.764855860446e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.212505522844e-01 > > 1 SNES Function norm 2.958996472386e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.273222034162e-01 > > 1 SNES Function norm 2.994512887620e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 > dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 3.317240589134e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.213246532918e-01 > > 1 SNES Function norm 2.799468604767e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.274570888397e-01 > > 1 SNES Function norm 3.066048050994e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 > dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072189e+03 > > 1 SNES Function norm 2.653507278572e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.213869585841e-01 > > 1 SNES Function norm 2.177156902895e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.275136370365e-01 > > 1 SNES Function norm 1.962849131557e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 > dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072218e+03 > > 1 SNES Function norm 5.664907315679e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.223208399368e-01 > > 1 SNES Function norm 5.688863091415e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.287121218919e-01 > > 1 SNES Function norm 4.085338521320e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 > dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473071968e+03 > > 1 SNES Function norm 4.694691905235e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.211786508657e-01 > > 1 SNES Function norm 1.503497433939e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.272667798977e-01 > > 1 SNES Function norm 2.176132327279e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 > dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: > > [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED > > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 > > [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 > 2023 > > [0]PETSC ERROR: Configure options --prefix=/usr/local > --with-mpi-dir=/usr/local --with-fc=0 --with-openmp > --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 > > [0]PETSC ERROR: #1 TSStep() at > /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 > > > > > > > > *From: *Barry Smith > *Sent: *Friday, February 17, 2023 12:58 PM > *To: *Tu, Jiannan > *Cc: *petsc-users > *Subject: *Re: [petsc-users] TS failed due to diverged_step_rejected > > > > *CAUTION:* This email was sent from outside the UMass Lowell network. > > > > > > Can you please run with also the options -ts_monitor -ts_adapt_monitor ? > > > > The output is confusing because it prints that the Nonlinear solve has > converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which > seems contradictory > > > > > > > > On Feb 17, 2023, at 12:09 PM, Tu, Jiannan wrote: > > > > My code uses TS to solve a set of multi-fluid MHD equations. The jacobian > is provided with function F(t, u, u'). Both linear and nonlinear solvers > converge but snes repeats itself until gets "TSStep has failed due to > diverged_step_rejected." > > > > Is it because I used TSStep rather than TSSolve? I have checked the > condition number. The condition number with pc_type asm is about 1 (without > precondition it is about 4x10^4). The maximum ratio of off-diagonal > jacobian element over diagonal element is about 21. > > > > Could you help me to identify what is going wrong? > > > > Thank you very much! > > > > Jiannan > > > > > --------------------------------------------------------------------------------------------------- > > Run command with options > > > > mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type > gmres -pc_type asm \ > > -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol > 1.0e-4 \ > > -snes_converged_reason > > > > The output message is > > > > Start time advancing ... > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 1.673091274668e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 8.715428433630e-02 > > 1 SNES Function norm 4.995727626692e-04 > > 2 SNES Function norm 5.498018152230e-08 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 3.237461568254e-01 > > 1 SNES Function norm 7.988531005091e-04 > > 2 SNES Function norm 1.280948196292e-07 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 4.881903203545e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 7.562592690785e-02 > > 1 SNES Function norm 1.143078818923e-04 > > 2 SNES Function norm 9.834547907735e-09 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 2.683968949758e-01 > > 1 SNES Function norm 1.838028436639e-04 > > 2 SNES Function norm 9.470813523140e-09 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 1.821562431175e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 1.005443458812e-01 > > 1 SNES Function norm 3.633336946661e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 1.515368382715e-01 > > 1 SNES Function norm 3.389298316830e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 4.541003359206e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 1.713800906043e-01 > > 1 SNES Function norm 1.179958172167e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.020265094117e-01 > > 1 SNES Function norm 1.513971290464e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 6.090269704320e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.136603895703e-01 > > 1 SNES Function norm 1.877474016012e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.127812462507e-01 > > 1 SNES Function norm 2.713146825704e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 2.793512213059e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.205196267430e-01 > > 1 SNES Function norm 2.572653773308e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.260057361977e-01 > > 1 SNES Function norm 2.705816087598e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 2.764855860446e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.212505522844e-01 > > 1 SNES Function norm 2.958996472386e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.273222034162e-01 > > 1 SNES Function norm 2.994512887620e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 3.317240589134e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.213246532918e-01 > > 1 SNES Function norm 2.799468604767e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.274570888397e-01 > > 1 SNES Function norm 3.066048050994e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.274473072189e+03 > > 1 SNES Function norm 2.653507278572e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.213869585841e-01 > > 1 SNES Function norm 2.177156902895e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.275136370365e-01 > > 1 SNES Function norm 1.962849131557e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.274473072218e+03 > > 1 SNES Function norm 5.664907315679e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.223208399368e-01 > > 1 SNES Function norm 5.688863091415e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.287121218919e-01 > > 1 SNES Function norm 4.085338521320e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.274473071968e+03 > > 1 SNES Function norm 4.694691905235e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.211786508657e-01 > > 1 SNES Function norm 1.503497433939e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.272667798977e-01 > > 1 SNES Function norm 2.176132327279e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: > > [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED > > [0]PETSC ERROR: See https://petsc.org/release/faq/ > > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 > > [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 > 2023 > > [0]PETSC ERROR: Configure options --prefix=/usr/local > --with-mpi-dir=/usr/local --with-fc=0 --with-openmp > --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 > > [0]PETSC ERROR: #1 TSStep() at > /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Fri Feb 17 14:32:54 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Fri, 17 Feb 2023 20:32:54 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> Message-ID: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 17 14:38:54 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 17 Feb 2023 15:38:54 -0500 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> Message-ID: On Fri, Feb 17, 2023 at 3:32 PM Tu, Jiannan wrote: > The ts_type arkimex is used. There is right hand-side function RHSFunction > set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). > > > > With adaptivity shut off, TS can finish its first time step after the 3rd > ?Nonlinear solve converged due to ??. The solution gives negative electron > and neutral temperatures at the bottom boundary. I need to fix the negative > temperatures and see how the code works. > > > > BTW, what is this ts_adapt? Is it by default on? > It is a controller for adaptive timestepping. It is on by default, but I suspect that you step into an unphysical regime, which makes its estimates unreliable. Thanks, Matt > Thank you, > > Jiannan > > > > *From: *Matthew Knepley > *Sent: *Friday, February 17, 2023 3:15 PM > *To: *Tu, Jiannan > *Cc: *Barry Smith ; petsc-users > > *Subject: *Re: [petsc-users] TS failed due to diverged_step_rejected > > > > *CAUTION:* This email was sent from outside the UMass Lowell network. > > > > I am not sure what TS you are using, but the estimate of the local > truncation error is 91.4, and does not seem > > to change when you make the step smaller, so something is off. You can > shut off the adaptivity using > > > > -ts_adapt_type none > > > > Thanks, > > > > Matt > > > > On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan wrote: > > These are what I got with the options you suggested. > > > > Thank you, > > Jiannan > > > > > ------------------------------------------------------------------------------- > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 1.673091274668e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 8.715428433630e-02 > > 1 SNES Function norm 4.995727626692e-04 > > 2 SNES Function norm 5.498018152230e-08 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 3.237461568254e-01 > > 1 SNES Function norm 7.988531005091e-04 > > 2 SNES Function norm 1.280948196292e-07 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 > dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 4.881903203545e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 7.562592690785e-02 > > 1 SNES Function norm 1.143078818923e-04 > > 2 SNES Function norm 9.834547907735e-09 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 2.683968949758e-01 > > 1 SNES Function norm 1.838028436639e-04 > > 2 SNES Function norm 9.470813523140e-09 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 > dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 1.821562431175e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 1.005443458812e-01 > > 1 SNES Function norm 3.633336946661e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 1.515368382715e-01 > > 1 SNES Function norm 3.389298316830e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 > dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 4.541003359206e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 1.713800906043e-01 > > 1 SNES Function norm 1.179958172167e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.020265094117e-01 > > 1 SNES Function norm 1.513971290464e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 > dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 6.090269704320e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.136603895703e-01 > > 1 SNES Function norm 1.877474016012e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.127812462507e-01 > > 1 SNES Function norm 2.713146825704e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 > dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 2.793512213059e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.205196267430e-01 > > 1 SNES Function norm 2.572653773308e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.260057361977e-01 > > 1 SNES Function norm 2.705816087598e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 > dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 2.764855860446e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.212505522844e-01 > > 1 SNES Function norm 2.958996472386e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.273222034162e-01 > > 1 SNES Function norm 2.994512887620e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 > dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 3.317240589134e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.213246532918e-01 > > 1 SNES Function norm 2.799468604767e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.274570888397e-01 > > 1 SNES Function norm 3.066048050994e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 > dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072189e+03 > > 1 SNES Function norm 2.653507278572e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.213869585841e-01 > > 1 SNES Function norm 2.177156902895e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.275136370365e-01 > > 1 SNES Function norm 1.962849131557e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 > dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473072218e+03 > > 1 SNES Function norm 5.664907315679e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.223208399368e-01 > > 1 SNES Function norm 5.688863091415e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.287121218919e-01 > > 1 SNES Function norm 4.085338521320e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 > dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 > > 0 SNES Function norm 2.274473071968e+03 > > 1 SNES Function norm 4.694691905235e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.211786508657e-01 > > 1 SNES Function norm 1.503497433939e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.272667798977e-01 > > 1 SNES Function norm 2.176132327279e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 > dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: > > [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED > > [0]PETSC ERROR: See https://petsc.org/release/faq/ > > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 > > [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 > 2023 > > [0]PETSC ERROR: Configure options --prefix=/usr/local > --with-mpi-dir=/usr/local --with-fc=0 --with-openmp > --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 > > [0]PETSC ERROR: #1 TSStep() at > /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 > > > > > > > > *From: *Barry Smith > *Sent: *Friday, February 17, 2023 12:58 PM > *To: *Tu, Jiannan > *Cc: *petsc-users > *Subject: *Re: [petsc-users] TS failed due to diverged_step_rejected > > > > *CAUTION:* This email was sent from outside the UMass Lowell network. > > > > > > Can you please run with also the options -ts_monitor -ts_adapt_monitor ? > > > > The output is confusing because it prints that the Nonlinear solve has > converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which > seems contradictory > > > > > > > > On Feb 17, 2023, at 12:09 PM, Tu, Jiannan wrote: > > > > My code uses TS to solve a set of multi-fluid MHD equations. The jacobian > is provided with function F(t, u, u'). Both linear and nonlinear solvers > converge but snes repeats itself until gets "TSStep has failed due to > diverged_step_rejected." > > > > Is it because I used TSStep rather than TSSolve? I have checked the > condition number. The condition number with pc_type asm is about 1 (without > precondition it is about 4x10^4). The maximum ratio of off-diagonal > jacobian element over diagonal element is about 21. > > > > Could you help me to identify what is going wrong? > > > > Thank you very much! > > > > Jiannan > > > > > --------------------------------------------------------------------------------------------------- > > Run command with options > > > > mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type > gmres -pc_type asm \ > > -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol > 1.0e-4 \ > > -snes_converged_reason > > > > The output message is > > > > Start time advancing ... > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 1.673091274668e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 8.715428433630e-02 > > 1 SNES Function norm 4.995727626692e-04 > > 2 SNES Function norm 5.498018152230e-08 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 3.237461568254e-01 > > 1 SNES Function norm 7.988531005091e-04 > > 2 SNES Function norm 1.280948196292e-07 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 4.881903203545e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 7.562592690785e-02 > > 1 SNES Function norm 1.143078818923e-04 > > 2 SNES Function norm 9.834547907735e-09 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 2.683968949758e-01 > > 1 SNES Function norm 1.838028436639e-04 > > 2 SNES Function norm 9.470813523140e-09 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 1.821562431175e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 1.005443458812e-01 > > 1 SNES Function norm 3.633336946661e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 1.515368382715e-01 > > 1 SNES Function norm 3.389298316830e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 4.541003359206e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 1.713800906043e-01 > > 1 SNES Function norm 1.179958172167e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.020265094117e-01 > > 1 SNES Function norm 1.513971290464e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 6.090269704320e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.136603895703e-01 > > 1 SNES Function norm 1.877474016012e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.127812462507e-01 > > 1 SNES Function norm 2.713146825704e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 2.793512213059e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.205196267430e-01 > > 1 SNES Function norm 2.572653773308e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.260057361977e-01 > > 1 SNES Function norm 2.705816087598e-06 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 2.764855860446e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.212505522844e-01 > > 1 SNES Function norm 2.958996472386e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 3.273222034162e-01 > > 1 SNES Function norm 2.994512887620e-05 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > > 0 SNES Function norm 2.274473072186e+03 > > 1 SNES Function norm 3.317240589134e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.213246532918e-01 > > 1 SNES Function norm 2.799468604767e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.274570888397e-01 > > 1 SNES Function norm 3.066048050994e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.274473072189e+03 > > 1 SNES Function norm 2.653507278572e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.213869585841e-01 > > 1 SNES Function norm 2.177156902895e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.275136370365e-01 > > 1 SNES Function norm 1.962849131557e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.274473072218e+03 > > 1 SNES Function norm 5.664907315679e-03 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.223208399368e-01 > > 1 SNES Function norm 5.688863091415e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.287121218919e-01 > > 1 SNES Function norm 4.085338521320e-03 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.274473071968e+03 > > 1 SNES Function norm 4.694691905235e-04 > > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > > 0 SNES Function norm 2.211786508657e-01 > > 1 SNES Function norm 1.503497433939e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > 0 SNES Function norm 3.272667798977e-01 > > 1 SNES Function norm 2.176132327279e-04 > > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: > > [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED > > [0]PETSC ERROR: See https://petsc.org/release/faq/ > > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 > > [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 > 2023 > > [0]PETSC ERROR: Configure options --prefix=/usr/local > --with-mpi-dir=/usr/local --with-fc=0 --with-openmp > --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 > > [0]PETSC ERROR: #1 TSStep() at > /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Feb 17 14:45:41 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Feb 2023 15:45:41 -0500 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> Message-ID: <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> > On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: > > The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). > > With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. > > BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry > > Thank you, > Jiannan > > From: Matthew Knepley > Sent: Friday, February 17, 2023 3:15 PM > To: Tu, Jiannan > Cc: Barry Smith ; petsc-users > Subject: Re: [petsc-users] TS failed due to diverged_step_rejected > > CAUTION: This email was sent from outside the UMass Lowell network. > > I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem > to change when you make the step smaller, so something is off. You can shut off the adaptivity using > > -ts_adapt_type none > > Thanks, > > Matt > > On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: > These are what I got with the options you suggested. > > Thank you, > Jiannan > > ------------------------------------------------------------------------------- > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 1.673091274668e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 8.715428433630e-02 > 1 SNES Function norm 4.995727626692e-04 > 2 SNES Function norm 5.498018152230e-08 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 3.237461568254e-01 > 1 SNES Function norm 7.988531005091e-04 > 2 SNES Function norm 1.280948196292e-07 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 4.881903203545e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 7.562592690785e-02 > 1 SNES Function norm 1.143078818923e-04 > 2 SNES Function norm 9.834547907735e-09 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 2.683968949758e-01 > 1 SNES Function norm 1.838028436639e-04 > 2 SNES Function norm 9.470813523140e-09 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 1.821562431175e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 1.005443458812e-01 > 1 SNES Function norm 3.633336946661e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 1.515368382715e-01 > 1 SNES Function norm 3.389298316830e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 4.541003359206e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 1.713800906043e-01 > 1 SNES Function norm 1.179958172167e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.020265094117e-01 > 1 SNES Function norm 1.513971290464e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 6.090269704320e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.136603895703e-01 > 1 SNES Function norm 1.877474016012e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.127812462507e-01 > 1 SNES Function norm 2.713146825704e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 2.793512213059e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.205196267430e-01 > 1 SNES Function norm 2.572653773308e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.260057361977e-01 > 1 SNES Function norm 2.705816087598e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 2.764855860446e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.212505522844e-01 > 1 SNES Function norm 2.958996472386e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.273222034162e-01 > 1 SNES Function norm 2.994512887620e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 3.317240589134e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.213246532918e-01 > 1 SNES Function norm 2.799468604767e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.274570888397e-01 > 1 SNES Function norm 3.066048050994e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072189e+03 > 1 SNES Function norm 2.653507278572e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.213869585841e-01 > 1 SNES Function norm 2.177156902895e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.275136370365e-01 > 1 SNES Function norm 1.962849131557e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473072218e+03 > 1 SNES Function norm 5.664907315679e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.223208399368e-01 > 1 SNES Function norm 5.688863091415e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.287121218919e-01 > 1 SNES Function norm 4.085338521320e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 > 0 SNES Function norm 2.274473071968e+03 > 1 SNES Function norm 4.694691905235e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.211786508657e-01 > 1 SNES Function norm 1.503497433939e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.272667798977e-01 > 1 SNES Function norm 2.176132327279e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 > [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 > [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 > [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 > > > > From: Barry Smith > Sent: Friday, February 17, 2023 12:58 PM > To: Tu, Jiannan > Cc: petsc-users > Subject: Re: [petsc-users] TS failed due to diverged_step_rejected > > CAUTION: This email was sent from outside the UMass Lowell network. > > > Can you please run with also the options -ts_monitor -ts_adapt_monitor ? > > The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory > > > > > On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: > > My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." > > Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. > > Could you help me to identify what is going wrong? > > Thank you very much! > > Jiannan > > --------------------------------------------------------------------------------------------------- > Run command with options > > mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ > -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ > -snes_converged_reason > > The output message is > > Start time advancing ... > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 1.673091274668e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 8.715428433630e-02 > 1 SNES Function norm 4.995727626692e-04 > 2 SNES Function norm 5.498018152230e-08 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 3.237461568254e-01 > 1 SNES Function norm 7.988531005091e-04 > 2 SNES Function norm 1.280948196292e-07 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 4.881903203545e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 7.562592690785e-02 > 1 SNES Function norm 1.143078818923e-04 > 2 SNES Function norm 9.834547907735e-09 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 2.683968949758e-01 > 1 SNES Function norm 1.838028436639e-04 > 2 SNES Function norm 9.470813523140e-09 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 1.821562431175e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 1.005443458812e-01 > 1 SNES Function norm 3.633336946661e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 1.515368382715e-01 > 1 SNES Function norm 3.389298316830e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 4.541003359206e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 1.713800906043e-01 > 1 SNES Function norm 1.179958172167e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.020265094117e-01 > 1 SNES Function norm 1.513971290464e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 6.090269704320e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.136603895703e-01 > 1 SNES Function norm 1.877474016012e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.127812462507e-01 > 1 SNES Function norm 2.713146825704e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 2.793512213059e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.205196267430e-01 > 1 SNES Function norm 2.572653773308e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.260057361977e-01 > 1 SNES Function norm 2.705816087598e-06 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 2.764855860446e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.212505522844e-01 > 1 SNES Function norm 2.958996472386e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 3.273222034162e-01 > 1 SNES Function norm 2.994512887620e-05 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 > 0 SNES Function norm 2.274473072186e+03 > 1 SNES Function norm 3.317240589134e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.213246532918e-01 > 1 SNES Function norm 2.799468604767e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.274570888397e-01 > 1 SNES Function norm 3.066048050994e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.274473072189e+03 > 1 SNES Function norm 2.653507278572e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.213869585841e-01 > 1 SNES Function norm 2.177156902895e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.275136370365e-01 > 1 SNES Function norm 1.962849131557e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.274473072218e+03 > 1 SNES Function norm 5.664907315679e-03 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.223208399368e-01 > 1 SNES Function norm 5.688863091415e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.287121218919e-01 > 1 SNES Function norm 4.085338521320e-03 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.274473071968e+03 > 1 SNES Function norm 4.694691905235e-04 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > 0 SNES Function norm 2.211786508657e-01 > 1 SNES Function norm 1.503497433939e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > 0 SNES Function norm 3.272667798977e-01 > 1 SNES Function norm 2.176132327279e-04 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 > [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 > [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 > [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Fri Feb 17 18:06:12 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Sat, 18 Feb 2023 00:06:12 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> Message-ID: It?s possible. I?m checking the rhs and stiff equations. Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:39 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Fri, Feb 17, 2023 at 3:32 PM Tu, Jiannan > wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is a controller for adaptive timestepping. It is on by default, but I suspect that you step into an unphysical regime, which makes its estimates unreliable. Thanks, Matt Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Fri Feb 17 18:19:04 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Sat, 18 Feb 2023 00:19:04 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> Message-ID: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. The solution from TSStep is checked for positivity of densities and temperatures. >From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Fri Feb 17 20:49:56 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Fri, 17 Feb 2023 20:49:56 -0600 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue Message-ID: Dear PETSc team, I am using PETSc for Fortran with DMPlex. I have been using this version of PETSc: >>git rev-parse origin >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>git rev-parse FETCH_HEAD >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b There has been no issue, before the one with VTK viewer, which Jed fixed today ( https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 ). Since that MR has been merged into the main repo, I pulled the latest version of PETSc (basically I cloned it from scratch). But if I use the same configure option with before, and run my code, then there is an issue with halo exchange. The code runs without error message, but it gives wrong solution field. I guess the issue I have is related to graph partitioner or halo exchange part. This is because if I run the code with 1-proc, the solution is correct. I only updated the version of PETSc and there was no change in my own code. Could I get any comments on the issue? I was wondering if there have been many changes in halo exchange or graph partitioning & distributing part related to DMPlex. Thanks, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Fri Feb 17 22:54:41 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Sat, 18 Feb 2023 04:54:41 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> Message-ID: <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Sat Feb 18 01:24:51 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Sat, 18 Feb 2023 15:24:51 +0800 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function Message-ID: Dear PETSc Group, I am writing to inquire about the function DMAdaptLabel in PETSc. I am trying to use it coarse a mesh, but the resulting mesh is refined. In the following code, all of the `adpat` label values were set to 2 (DM_ADAPT_COARSEN). There must be something wrong. Could you give some suggestions? ```python from firedrake import * from firedrake.petsc import PETSc def mark_all_cells(mesh): plex = mesh.topology_dm with PETSc.Log.Event("ADD_ADAPT_LABEL"): plex.createLabel('adapt') cs, ce = plex.getHeightStratum(0) for i in range(cs, ce): plex.setLabelValue('adapt', i, 2) return plex mesh = RectangleMesh(10, 10, 1, 1) x = SpatialCoordinate(mesh) V = FunctionSpace(mesh, 'CG', 1) f = Function(V).interpolate(10 + 10*sin(x[0])) triplot(mesh) plex = mark_all_cells(mesh) new_plex = plex.adaptLabel('adapt') mesh = Mesh(new_plex) triplot(mesh) ``` Thank you very much for your time. Best wishes, Zongze -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: old_plex.png Type: image/png Size: 71236 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: new_plex.png Type: image/png Size: 68806 bytes Desc: not available URL: From yangzongze at gmail.com Sat Feb 18 05:41:11 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Sat, 18 Feb 2023 19:41:11 +0800 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: Another question on mesh coarsening is about `DMCoarsen` which will fail when running in parallel. I generate a mesh in Firedrake, and then create function space and functions, after that, I get the dmplex and coarsen it. When running in serials, I get the mesh coarsened correctly. But it failed with errors in ParMMG when running parallel. However, If I did not create function space and functions on the original mesh, everything works fine too. The code and the error logs are attached. Thank you for your time and attention? Best wishes, Zongze On Sat, 18 Feb 2023 at 15:24, Zongze Yang wrote: > Dear PETSc Group, > > I am writing to inquire about the function DMAdaptLabel in PETSc. > I am trying to use it coarse a mesh, but the resulting mesh is refined. > > In the following code, all of the `adpat` label values were set to 2 > (DM_ADAPT_COARSEN). > There must be something wrong. Could you give some suggestions? > > ```python > from firedrake import * > from firedrake.petsc import PETSc > > def mark_all_cells(mesh): > plex = mesh.topology_dm > with PETSc.Log.Event("ADD_ADAPT_LABEL"): > plex.createLabel('adapt') > cs, ce = plex.getHeightStratum(0) > for i in range(cs, ce): > plex.setLabelValue('adapt', i, 2) > > return plex > > mesh = RectangleMesh(10, 10, 1, 1) > > x = SpatialCoordinate(mesh) > V = FunctionSpace(mesh, 'CG', 1) > f = Function(V).interpolate(10 + 10*sin(x[0])) > triplot(mesh) > > plex = mark_all_cells(mesh) > new_plex = plex.adaptLabel('adapt') > mesh = Mesh(new_plex) > triplot(mesh) > ``` > > Thank you very much for your time. > > Best wishes, > Zongze > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: parallel-errors.logs Type: application/octet-stream Size: 103411 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_coarsen.py Type: text/x-python Size: 797 bytes Desc: not available URL: From Jiannan_Tu at uml.edu Sat Feb 18 08:44:35 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Sat, 18 Feb 2023 14:44:35 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> Message-ID: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. >From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Sat Feb 18 10:59:52 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sat, 18 Feb 2023 10:59:52 -0600 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: As a follow-up, I tested: (1) Download tar for v3.18.4 from petsc gitlab ( https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on DMPlex halo exchange. This version works as I expect. (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) has issues with DMPlex halo exchange. Something is suspicious about this main branch, related to DMPlex halo. The solution field I got is not correct. But it works okay with 1-proc. Does anyone have any comments on this issue? I am curious if other DMPlex users have no problem regarding halo exchange. FYI, I do not declare ghost layers for halo exchange. Thanks, Mike > Dear PETSc team, > > I am using PETSc for Fortran with DMPlex. I have been using this version > of PETSc: > >>git rev-parse origin > >>995ec06f924a86c4d28df68d1fdd6572768b0de1 > >>git rev-parse FETCH_HEAD > >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b > > There has been no issue, before the one with VTK viewer, which Jed fixed > today ( > https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 > ). > > Since that MR has been merged into the main repo, I pulled the latest > version of PETSc (basically I cloned it from scratch). But if I use the > same configure option with before, and run my code, then there is an issue > with halo exchange. The code runs without error message, but it gives wrong > solution field. I guess the issue I have is related to graph partitioner or > halo exchange part. This is because if I run the code with 1-proc, the > solution is correct. I only updated the version of PETSc and there was no > change in my own code. Could I get any comments on the issue? I was > wondering if there have been many changes in halo exchange or graph > partitioning & distributing part related to DMPlex. > > Thanks, > Mike > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Sat Feb 18 11:40:06 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Sat, 18 Feb 2023 17:40:06 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> Message-ID: <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> On Feb 18, 2023, at 8:44 AM, Tu, Jiannan wrote: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. This might be the problem. Your F(u, udot) is missing udot according to your description. Take a simple ODE udot = f(u) + g(u) for example. One way to partition this ODE is to define F = udot - f(u) as the IFunction and G = g(u) as the RHSFunction. Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Sat Feb 18 16:28:43 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Sat, 18 Feb 2023 22:28:43 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> Message-ID: Thanks for the instruction. This is the boundary condition and there is no udot in the equation. I think this is the way to define IFunction at the boundary. Maybe I?m wrong? Or is there some way to introduce udot into the specification of the equation at the boundary from the aspect of the implementation for TS? Thank you, Jiannan From: Zhang, Hong Sent: Saturday, February 18, 2023 12:40 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 18, 2023, at 8:44 AM, Tu, Jiannan wrote: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. This might be the problem. Your F(u, udot) is missing udot according to your description. Take a simple ODE udot = f(u) + g(u) for example. One way to partition this ODE is to define F = udot - f(u) as the IFunction and G = g(u) as the RHSFunction. Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. >From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Sun Feb 19 15:48:18 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Sun, 19 Feb 2023 21:48:18 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> Message-ID: It is fine to drop udot for the boundary points, but you need to keep udot for all the other points. In addition, which boundary condition do you use in IFunction? The way you are treating the boundary points actually leads to a system of differential-algebraic equations, which could be difficult to solve with the ARKIMEX solver. Can you try to exclude the boundary points from the computational domain so that you will have just a system of ODEs? Hong (Mr.) On Feb 18, 2023, at 4:28 PM, Tu, Jiannan wrote: Thanks for the instruction. This is the boundary condition and there is no udot in the equation. I think this is the way to define IFunction at the boundary. Maybe I?m wrong? Or is there some way to introduce udot into the specification of the equation at the boundary from the aspect of the implementation for TS? Thank you, Jiannan From: Zhang, Hong Sent: Saturday, February 18, 2023 12:40 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 18, 2023, at 8:44 AM, Tu, Jiannan wrote: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. This might be the problem. Your F(u, udot) is missing udot according to your description. Take a simple ODE udot = f(u) + g(u) for example. One way to partition this ODE is to define F = udot - f(u) as the IFunction and G = g(u) as the RHSFunction. Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Sun Feb 19 21:23:03 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Mon, 20 Feb 2023 03:23:03 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> Message-ID: It is the second order derivative of, say electron temperature = 0 at the boundary. I am not sure how I can exclude the boundary points because the values of unknowns must be specified at the boundary. Are there any other solvers, e.g., CN, good to solve the equation system? Thank you, Jiannan From: Zhang, Hong Sent: Sunday, February 19, 2023 4:48 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. It is fine to drop udot for the boundary points, but you need to keep udot for all the other points. In addition, which boundary condition do you use in IFunction? The way you are treating the boundary points actually leads to a system of differential-algebraic equations, which could be difficult to solve with the ARKIMEX solver. Can you try to exclude the boundary points from the computational domain so that you will have just a system of ODEs? Hong (Mr.) On Feb 18, 2023, at 4:28 PM, Tu, Jiannan wrote: Thanks for the instruction. This is the boundary condition and there is no udot in the equation. I think this is the way to define IFunction at the boundary. Maybe I?m wrong? Or is there some way to introduce udot into the specification of the equation at the boundary from the aspect of the implementation for TS? Thank you, Jiannan From: Zhang, Hong Sent: Saturday, February 18, 2023 12:40 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 18, 2023, at 8:44 AM, Tu, Jiannan wrote: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. This might be the problem. Your F(u, udot) is missing udot according to your description. Take a simple ODE udot = f(u) + g(u) for example. One way to partition this ODE is to define F = udot - f(u) as the IFunction and G = g(u) as the RHSFunction. Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. >From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Mon Feb 20 10:07:26 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 20 Feb 2023 16:07:26 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> Message-ID: <88895B56-2BEF-49BF-B6E9-F75186E712D9@anl.gov> If you have to include the boundary points, I would suggest starting from a fully implicit solver such as CN or BEuler with a finite-difference approximated Jacobian. When this works for a small scale setting, you can build up more functionalities such as IMEX and analytical Jacobians and extend the problem to a larger scale. But the udot issue needs to be fixed in the first place. Hong (Mr.) On Feb 19, 2023, at 9:23 PM, Tu, Jiannan wrote: It is the second order derivative of, say electron temperature = 0 at the boundary. I am not sure how I can exclude the boundary points because the values of unknowns must be specified at the boundary. Are there any other solvers, e.g., CN, good to solve the equation system? Thank you, Jiannan From: Zhang, Hong Sent: Sunday, February 19, 2023 4:48 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. It is fine to drop udot for the boundary points, but you need to keep udot for all the other points. In addition, which boundary condition do you use in IFunction? The way you are treating the boundary points actually leads to a system of differential-algebraic equations, which could be difficult to solve with the ARKIMEX solver. Can you try to exclude the boundary points from the computational domain so that you will have just a system of ODEs? Hong (Mr.) On Feb 18, 2023, at 4:28 PM, Tu, Jiannan wrote: Thanks for the instruction. This is the boundary condition and there is no udot in the equation. I think this is the way to define IFunction at the boundary. Maybe I?m wrong? Or is there some way to introduce udot into the specification of the equation at the boundary from the aspect of the implementation for TS? Thank you, Jiannan From: Zhang, Hong Sent: Saturday, February 18, 2023 12:40 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 18, 2023, at 8:44 AM, Tu, Jiannan wrote: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. This might be the problem. Your F(u, udot) is missing udot according to your description. Take a simple ODE udot = f(u) + g(u) for example. One way to partition this ODE is to define F = udot - f(u) as the IFunction and G = g(u) as the RHSFunction. Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 20 11:05:34 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Feb 2023 12:05:34 -0500 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: On Sat, Feb 18, 2023 at 12:00 PM Mike Michell wrote: > As a follow-up, I tested: > > (1) Download tar for v3.18.4 from petsc gitlab ( > https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on DMPlex > halo exchange. This version works as I expect. > (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) has > issues with DMPlex halo exchange. Something is suspicious about this main > branch, related to DMPlex halo. The solution field I got is not correct. > But it works okay with 1-proc. > > Does anyone have any comments on this issue? I am curious if other DMPlex > users have no problem regarding halo exchange. FYI, I do not declare ghost > layers for halo exchange. > There should not have been any changes there and there are definitely tests for this. It would be great if you could send something that failed. I could fix it and add it as a test. Thanks, Matt > Thanks, > Mike > > >> Dear PETSc team, >> >> I am using PETSc for Fortran with DMPlex. I have been using this version >> of PETSc: >> >>git rev-parse origin >> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >> >>git rev-parse FETCH_HEAD >> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >> >> There has been no issue, before the one with VTK viewer, which Jed fixed >> today ( >> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >> ). >> >> Since that MR has been merged into the main repo, I pulled the latest >> version of PETSc (basically I cloned it from scratch). But if I use the >> same configure option with before, and run my code, then there is an issue >> with halo exchange. The code runs without error message, but it gives wrong >> solution field. I guess the issue I have is related to graph partitioner or >> halo exchange part. This is because if I run the code with 1-proc, the >> solution is correct. I only updated the version of PETSc and there was no >> change in my own code. Could I get any comments on the issue? I was >> wondering if there have been many changes in halo exchange or graph >> partitioning & distributing part related to DMPlex. >> >> Thanks, >> Mike >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Mon Feb 20 17:46:22 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Mon, 20 Feb 2023 23:46:22 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: <88895B56-2BEF-49BF-B6E9-F75186E712D9@anl.gov> References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> <88895B56-2BEF-49BF-B6E9-F75186E712D9@anl.gov> Message-ID: Thanks for the suggestion. I?ll try CN and other implicit solvers and see how they works. Jiannan From: Zhang, Hong Sent: Monday, February 20, 2023 11:07 AM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. If you have to include the boundary points, I would suggest starting from a fully implicit solver such as CN or BEuler with a finite-difference approximated Jacobian. When this works for a small scale setting, you can build up more functionalities such as IMEX and analytical Jacobians and extend the problem to a larger scale. But the udot issue needs to be fixed in the first place. Hong (Mr.) On Feb 19, 2023, at 9:23 PM, Tu, Jiannan wrote: It is the second order derivative of, say electron temperature = 0 at the boundary. I am not sure how I can exclude the boundary points because the values of unknowns must be specified at the boundary. Are there any other solvers, e.g., CN, good to solve the equation system? Thank you, Jiannan From: Zhang, Hong Sent: Sunday, February 19, 2023 4:48 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. It is fine to drop udot for the boundary points, but you need to keep udot for all the other points. In addition, which boundary condition do you use in IFunction? The way you are treating the boundary points actually leads to a system of differential-algebraic equations, which could be difficult to solve with the ARKIMEX solver. Can you try to exclude the boundary points from the computational domain so that you will have just a system of ODEs? Hong (Mr.) On Feb 18, 2023, at 4:28 PM, Tu, Jiannan wrote: Thanks for the instruction. This is the boundary condition and there is no udot in the equation. I think this is the way to define IFunction at the boundary. Maybe I?m wrong? Or is there some way to introduce udot into the specification of the equation at the boundary from the aspect of the implementation for TS? Thank you, Jiannan From: Zhang, Hong Sent: Saturday, February 18, 2023 12:40 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 18, 2023, at 8:44 AM, Tu, Jiannan wrote: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. This might be the problem. Your F(u, udot) is missing udot according to your description. Take a simple ODE udot = f(u) + g(u) for example. One way to partition this ODE is to define F = udot - f(u) as the IFunction and G = g(u) as the RHSFunction. Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. >From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From heepark at sandia.gov Mon Feb 20 21:06:37 2023 From: heepark at sandia.gov (Park, Heeho) Date: Tue, 21 Feb 2023 03:06:37 +0000 Subject: [petsc-users] HYPRE requires C++ compiler Message-ID: Hi PETSc developers, I?m using the same configure script on my system to compile petsc-main branch as petsc-v3.17.2, but now I am receiving this message. I?ve tried it several different ways but HYPRE installer does not recognize the mpicxx I?m using. I can send you the configure log and file if you would like to see them. ============================================================================================= Trying to download https://github.com/hypre-space/hypre for HYPRE ============================================================================================= ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Error: Hypre requires C++ compiler. None specified ********************************************************************************************* Currently Loaded Modules: 1) intel/21.3.0 2) mkl/21.3.0 3) openmpi-intel/4.0 4) anaconda3/5.2.0 5) gcc/4.9.3 Thanks, Heeho D. Park, Ph.D. Computational Hydrologist & Nuclear Engineer Center for Energy & Earth Systems Applied Systems Analysis and Research Dept. Sandia National Laboratories Email: heepark at sandia.gov Web: Homepage -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Mon Feb 20 22:28:17 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Tue, 21 Feb 2023 05:28:17 +0100 Subject: [petsc-users] HYPRE requires C++ compiler In-Reply-To: References: Message-ID: > On 21 Feb 2023, at 4:07 AM, Park, Heeho via petsc-users wrote: > > ? > Hi PETSc developers, > > I?m using the same configure script on my system to compile petsc-main branch as petsc-v3.17.2, but now I am receiving this message. > I?ve tried it several different ways but HYPRE installer does not recognize the mpicxx I?m using. I can send you the configure log and file if you would like to see them. Send those files to petsc-maint at mcs.anl.gov Thanks, Pierre > > ============================================================================================= > Trying to download https://github.com/hypre-space/hypre for HYPRE > ============================================================================================= > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Error: Hypre requires C++ compiler. None specified > ********************************************************************************************* > > > Currently Loaded Modules: > 1) intel/21.3.0 2) mkl/21.3.0 3) openmpi-intel/4.0 4) anaconda3/5.2.0 5) gcc/4.9.3 > > Thanks, > > Heeho D. Park, Ph.D. > Computational Hydrologist & Nuclear Engineer > Center for Energy & Earth Systems > Applied Systems Analysis and Research Dept. > Sandia National Laboratories > Email: heepark at sandia.gov > Web: Homepage > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 21 07:30:51 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 21 Feb 2023 08:30:51 -0500 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley wrote: > On Sat, Feb 18, 2023 at 12:00 PM Mike Michell > wrote: > >> As a follow-up, I tested: >> >> (1) Download tar for v3.18.4 from petsc gitlab ( >> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on DMPlex >> halo exchange. This version works as I expect. >> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) has >> issues with DMPlex halo exchange. Something is suspicious about this main >> branch, related to DMPlex halo. The solution field I got is not correct. >> But it works okay with 1-proc. >> >> Does anyone have any comments on this issue? I am curious if other DMPlex >> users have no problem regarding halo exchange. FYI, I do not declare ghost >> layers for halo exchange. >> > > There should not have been any changes there and there are definitely > tests for this. > > It would be great if you could send something that failed. I could fix it > and add it as a test. > Just to follow up, we have tests of the low-level communication (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have tests that use halo exchange for PDE calculations, for example SNES tutorial ex12, ex13, ex62. THe convergence rates should be off if the halo exchange were wrong. Is there any example similar to your code that is failing on your installation? Or is there a way to run your code? Thanks, Matt > Thanks, > > Matt > > >> Thanks, >> Mike >> >> >>> Dear PETSc team, >>> >>> I am using PETSc for Fortran with DMPlex. I have been using this version >>> of PETSc: >>> >>git rev-parse origin >>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>> >>git rev-parse FETCH_HEAD >>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>> >>> There has been no issue, before the one with VTK viewer, which Jed fixed >>> today ( >>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>> ). >>> >>> Since that MR has been merged into the main repo, I pulled the latest >>> version of PETSc (basically I cloned it from scratch). But if I use the >>> same configure option with before, and run my code, then there is an issue >>> with halo exchange. The code runs without error message, but it gives wrong >>> solution field. I guess the issue I have is related to graph partitioner or >>> halo exchange part. This is because if I run the code with 1-proc, the >>> solution is correct. I only updated the version of PETSc and there was no >>> change in my own code. Could I get any comments on the issue? I was >>> wondering if there have been many changes in halo exchange or graph >>> partitioning & distributing part related to DMPlex. >>> >>> Thanks, >>> Mike >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From salmanuom206 at gmail.com Tue Feb 21 20:20:32 2023 From: salmanuom206 at gmail.com (Salman Ahmad) Date: Wed, 22 Feb 2023 10:20:32 +0800 Subject: [petsc-users] Installation of SuperLU_MT? Message-ID: Hello, I am trying to install SuperLU_MT on linux and only find the command line: brew install superlu_mt ( https://openseespydoc.readthedocs.io/en/stable/src/compileqt.html) but this is not working and show me the error: Warning: No available formula with the name "superlu_mt". Did you mean superlu? ==> Searching for similarly named formulae... ==> Formulae superlu ? To install superlu ?, run: brew install superlu ? Any solution to install SuperLU_MT? Best REgards, Salman Ahmad -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jiannan_Tu at uml.edu Tue Feb 21 20:54:46 2023 From: Jiannan_Tu at uml.edu (Tu, Jiannan) Date: Wed, 22 Feb 2023 02:54:46 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: <88895B56-2BEF-49BF-B6E9-F75186E712D9@anl.gov> References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> <88895B56-2BEF-49BF-B6E9-F75186E712D9@anl.gov> Message-ID: CN or BEular doesn?t work. They produce negative densities at the lower boundary even RHS functions are positive. So for TS, all equations must include udot? Thank you, Jiannan From: Zhang, Hong Sent: Monday, February 20, 2023 11:07 AM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. If you have to include the boundary points, I would suggest starting from a fully implicit solver such as CN or BEuler with a finite-difference approximated Jacobian. When this works for a small scale setting, you can build up more functionalities such as IMEX and analytical Jacobians and extend the problem to a larger scale. But the udot issue needs to be fixed in the first place. Hong (Mr.) On Feb 19, 2023, at 9:23 PM, Tu, Jiannan wrote: It is the second order derivative of, say electron temperature = 0 at the boundary. I am not sure how I can exclude the boundary points because the values of unknowns must be specified at the boundary. Are there any other solvers, e.g., CN, good to solve the equation system? Thank you, Jiannan From: Zhang, Hong Sent: Sunday, February 19, 2023 4:48 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. It is fine to drop udot for the boundary points, but you need to keep udot for all the other points. In addition, which boundary condition do you use in IFunction? The way you are treating the boundary points actually leads to a system of differential-algebraic equations, which could be difficult to solve with the ARKIMEX solver. Can you try to exclude the boundary points from the computational domain so that you will have just a system of ODEs? Hong (Mr.) On Feb 18, 2023, at 4:28 PM, Tu, Jiannan wrote: Thanks for the instruction. This is the boundary condition and there is no udot in the equation. I think this is the way to define IFunction at the boundary. Maybe I?m wrong? Or is there some way to introduce udot into the specification of the equation at the boundary from the aspect of the implementation for TS? Thank you, Jiannan From: Zhang, Hong Sent: Saturday, February 18, 2023 12:40 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 18, 2023, at 8:44 AM, Tu, Jiannan wrote: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. This might be the problem. Your F(u, udot) is missing udot according to your description. Take a simple ODE udot = f(u) + g(u) for example. One way to partition this ODE is to define F = udot - f(u) as the IFunction and G = g(u) as the RHSFunction. Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. >From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Feb 21 21:21:56 2023 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 22 Feb 2023 03:21:56 +0000 Subject: [petsc-users] Installation of SuperLU_MT? In-Reply-To: References: Message-ID: PETSc does not have support for SuperLU_MT. It only supports superlu (sequential) and superlu_dist. Hong ________________________________ From: petsc-users on behalf of Salman Ahmad Sent: Tuesday, February 21, 2023 8:20 PM To: petsc-users at mcs.anl.gov Subject: [petsc-users] Installation of SuperLU_MT? Hello, I am trying to install SuperLU_MT on linux and only find the command line: brew install superlu_mt (https://openseespydoc.readthedocs.io/en/stable/src/compileqt.html) but this is not working and show me the error: Warning: No available formula with the name "superlu_mt". Did you mean superlu? ==> Searching for similarly named formulae... ==> Formulae superlu ? To install superlu ?, run: brew install superlu ? Any solution to install SuperLU_MT? Best REgards, Salman Ahmad -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Feb 21 22:20:57 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Wed, 22 Feb 2023 04:20:57 +0000 Subject: [petsc-users] TS failed due to diverged_step_rejected In-Reply-To: References: <295E7E1A-1649-435F-AE65-F061F287513F@petsc.dev> <3F1BD989-8516-4649-A385-5F94FD1A9470@petsc.dev> <0B7BA32F-03CE-44F2-A9A3-4584B2D7AB94@anl.gov> <9523EDF9-7C02-4872-9E0E-1DFCBCB28066@anl.gov> <88895B56-2BEF-49BF-B6E9-F75186E712D9@anl.gov> Message-ID: On Feb 21, 2023, at 8:54 PM, Tu, Jiannan wrote: CN or BEular doesn?t work. They produce negative densities at the lower boundary even RHS functions are positive. So for TS, all equations must include udot? You have algebraic constraints only for the boundary points. For all the other points, you must have udot in IFunction. I recommend you to take a look at the example src/ts/tutorials/ex25.c Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Monday, February 20, 2023 11:07 AM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. If you have to include the boundary points, I would suggest starting from a fully implicit solver such as CN or BEuler with a finite-difference approximated Jacobian. When this works for a small scale setting, you can build up more functionalities such as IMEX and analytical Jacobians and extend the problem to a larger scale. But the udot issue needs to be fixed in the first place. Hong (Mr.) On Feb 19, 2023, at 9:23 PM, Tu, Jiannan wrote: It is the second order derivative of, say electron temperature = 0 at the boundary. I am not sure how I can exclude the boundary points because the values of unknowns must be specified at the boundary. Are there any other solvers, e.g., CN, good to solve the equation system? Thank you, Jiannan From: Zhang, Hong Sent: Sunday, February 19, 2023 4:48 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. It is fine to drop udot for the boundary points, but you need to keep udot for all the other points. In addition, which boundary condition do you use in IFunction? The way you are treating the boundary points actually leads to a system of differential-algebraic equations, which could be difficult to solve with the ARKIMEX solver. Can you try to exclude the boundary points from the computational domain so that you will have just a system of ODEs? Hong (Mr.) On Feb 18, 2023, at 4:28 PM, Tu, Jiannan wrote: Thanks for the instruction. This is the boundary condition and there is no udot in the equation. I think this is the way to define IFunction at the boundary. Maybe I?m wrong? Or is there some way to introduce udot into the specification of the equation at the boundary from the aspect of the implementation for TS? Thank you, Jiannan From: Zhang, Hong Sent: Saturday, February 18, 2023 12:40 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 18, 2023, at 8:44 AM, Tu, Jiannan wrote: The RHS function at the bottom boundary is determined by the boundary condition, which is the second order derivative = 0, i.e. G(u) = 2*X[i=1] ? X[i=2]. Then in IFunction, F(u, udot) = X[i=0]. This might be the problem. Your F(u, udot) is missing udot according to your description. Take a simple ODE udot = f(u) + g(u) for example. One way to partition this ODE is to define F = udot - f(u) as the IFunction and G = g(u) as the RHSFunction. Hong (Mr.) Thank you, Jiannan From: Zhang, Hong Sent: Friday, February 17, 2023 11:54 PM To: Tu, Jiannan Cc: Barry Smith; Hong Zhang; Constantinescu, Emil M.; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected You don't often get email from hongzhang at anl.gov. Learn why this is important CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 6:19 PM, Tu, Jiannan wrote: I need to find out what causes negative temperature first. Following is the message with adaptivity turned off. The G(u) gives right-hand equation for electron temperature at bottom boundary. The F(u, u?) function is F(u, u?) = X = G(u) and the jacobian element is d F(u, u?) / dX =1. This looks strange. Can you elaborate a bit on your partitioned ODE? For example, how are your F(u,udot) (IFunction) and G(u) (RHSFunction) defined? A good IMEX example can be found at ts/tutorial/advection-diffusion-reaction/ex5.c (and reaction_diffusion.c). Hong (Mr.) The solution from TSStep is checked for positivity of densities and temperatures. From the message below, it is seen that G(u) > 0 (I added output of right-hand equation for electron temperature). The solution for electron temperature X should be X * jacobian element = G(u) > 0 since jacobian element = 1. I don?t understand why it becomes negative. Is my understanding of TS formula incorrect? Thank you, Jiannan ---------------------------------- G(u) = 1.86534e-07 0 SNES Function norm 2.274473072183e+03 1 SNES Function norm 8.641749325070e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 G(u) = 1.86534e-07 0 SNES Function norm 8.716501970511e-02 1 SNES Function norm 2.213263548813e-04 2 SNES Function norm 2.779985176426e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 0 SNES Function norm 3.177195995186e-01 1 SNES Function norm 3.607702491344e-04 2 SNES Function norm 4.345809629121e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 G(u) = 1.86534e-07 TSAdapt none arkimex 0:3 step 0 accepted t=42960 + 2.189e-02 dt=2.189e-02 electron temperature = -3.6757e-15 at (i, j, k) = (0, 1, 0) From: Barry Smith Sent: Friday, February 17, 2023 3:45 PM To: Tu, Jiannan; Hong Zhang; Emil Constantinescu Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. On Feb 17, 2023, at 3:32 PM, Tu, Jiannan wrote: The ts_type arkimex is used. There is right hand-side function RHSFunction set by TSSetRHSFunction() and also stiff function set by TSSetIFunction(). With adaptivity shut off, TS can finish its first time step after the 3rd ?Nonlinear solve converged due to ??. The solution gives negative electron and neutral temperatures at the bottom boundary. I need to fix the negative temperatures and see how the code works. BTW, what is this ts_adapt? Is it by default on? It is default for some of the TSTypes (in particular, the better ones). It adapts the timestep to ensure some local error estimate is below a certain tolerance. As Matt notes normally as it tries smaller and smaller time steps the local error estimate would get smaller and smaller; this is not happening here, hence the error. Have you tried with the argument -ts_arkimex_fully_implicit ? I am not an expert but my guess is something is "odd" about your functions, either the RHSFunction or the Function or both. Do you have a hierarchy of models for your problem? Could you try runs with fewer terms in your functions, that may be producing the difficulties? If you can determine what triggers the problem with the local error estimators, that might help the experts in ODE solution (not me) determine what could be going wrong. Barry Thank you, Jiannan From: Matthew Knepley Sent: Friday, February 17, 2023 3:15 PM To: Tu, Jiannan Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. I am not sure what TS you are using, but the estimate of the local truncation error is 91.4, and does not seem to change when you make the step smaller, so something is off. You can shut off the adaptivity using -ts_adapt_type none Thanks, Matt On Fri, Feb 17, 2023 at 3:01 PM Tu, Jiannan > wrote: These are what I got with the options you suggested. Thank you, Jiannan ------------------------------------------------------------------------------- 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 2.189e-02 dt=4.374e-03 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-03 dt=4.374e-04 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-04 dt=4.374e-05 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-05 dt=4.374e-06 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-06 dt=4.374e-07 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-07 dt=4.374e-08 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-08 dt=4.374e-09 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-09 dt=4.374e-10 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-10 dt=4.374e-11 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-11 dt=4.374e-12 wlte= 91.4 wltea= -1 wlter= -1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 TSAdapt basic arkimex 0:3 step 0 rejected t=42960 + 4.374e-12 dt=4.374e-13 wlte= 91.4 wltea= -1 wlter= -1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 14:54:22 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 From: Barry Smith Sent: Friday, February 17, 2023 12:58 PM To: Tu, Jiannan Cc: petsc-users Subject: Re: [petsc-users] TS failed due to diverged_step_rejected CAUTION: This email was sent from outside the UMass Lowell network. Can you please run with also the options -ts_monitor -ts_adapt_monitor ? The output is confusing because it prints that the Nonlinear solve has converged but then TSStep has failed due to DIVERGED_STEP_REJECTED which seems contradictory On Feb 17, 2023, at 12:09 PM, Tu, Jiannan > wrote: My code uses TS to solve a set of multi-fluid MHD equations. The jacobian is provided with function F(t, u, u'). Both linear and nonlinear solvers converge but snes repeats itself until gets "TSStep has failed due to diverged_step_rejected." Is it because I used TSStep rather than TSSolve? I have checked the condition number. The condition number with pc_type asm is about 1 (without precondition it is about 4x10^4). The maximum ratio of off-diagonal jacobian element over diagonal element is about 21. Could you help me to identify what is going wrong? Thank you very much! Jiannan --------------------------------------------------------------------------------------------------- Run command with options mpiexec -n $1 ./iditm3d -ts_type arkimex -snes_tyep ngmres -ksp_type gmres -pc_type asm \ -ts_rtol 1.0e-4 -ts_atol 1.0e-4 -snes_monitor -snes_rtol 1.0e-4 -snes_atol 1.0e-4 \ -snes_converged_reason The output message is Start time advancing ... 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.673091274668e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 8.715428433630e-02 1 SNES Function norm 4.995727626692e-04 2 SNES Function norm 5.498018152230e-08 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 3.237461568254e-01 1 SNES Function norm 7.988531005091e-04 2 SNES Function norm 1.280948196292e-07 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.881903203545e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 7.562592690785e-02 1 SNES Function norm 1.143078818923e-04 2 SNES Function norm 9.834547907735e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.683968949758e-01 1 SNES Function norm 1.838028436639e-04 2 SNES Function norm 9.470813523140e-09 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 2 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 1.821562431175e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 1.005443458812e-01 1 SNES Function norm 3.633336946661e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.515368382715e-01 1 SNES Function norm 3.389298316830e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 4.541003359206e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 1.713800906043e-01 1 SNES Function norm 1.179958172167e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.020265094117e-01 1 SNES Function norm 1.513971290464e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 6.090269704320e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.136603895703e-01 1 SNES Function norm 1.877474016012e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.127812462507e-01 1 SNES Function norm 2.713146825704e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.793512213059e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.205196267430e-01 1 SNES Function norm 2.572653773308e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.260057361977e-01 1 SNES Function norm 2.705816087598e-06 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 2.764855860446e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.212505522844e-01 1 SNES Function norm 2.958996472386e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 3.273222034162e-01 1 SNES Function norm 2.994512887620e-05 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 1 0 SNES Function norm 2.274473072186e+03 1 SNES Function norm 3.317240589134e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213246532918e-01 1 SNES Function norm 2.799468604767e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.274570888397e-01 1 SNES Function norm 3.066048050994e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072189e+03 1 SNES Function norm 2.653507278572e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.213869585841e-01 1 SNES Function norm 2.177156902895e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.275136370365e-01 1 SNES Function norm 1.962849131557e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473072218e+03 1 SNES Function norm 5.664907315679e-03 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.223208399368e-01 1 SNES Function norm 5.688863091415e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.287121218919e-01 1 SNES Function norm 4.085338521320e-03 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 2.274473071968e+03 1 SNES Function norm 4.694691905235e-04 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 0 SNES Function norm 2.211786508657e-01 1 SNES Function norm 1.503497433939e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 0 SNES Function norm 3.272667798977e-01 1 SNES Function norm 2.176132327279e-04 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.16.6, Mar 30, 2022 [0]PETSC ERROR: ./iditm3d on a named office by jtu Fri Feb 17 11:59:43 2023 [0]PETSC ERROR: Configure options --prefix=/usr/local --with-mpi-dir=/usr/local --with-fc=0 --with-openmp --with-hdf5-dir=/usr/local --download-f2cblaslapack=1 [0]PETSC ERROR: #1 TSStep() at /home/jtu/Downloads/petsc-3.16.6/src/ts/interface/ts.c:3583 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Feb 22 10:15:36 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 22 Feb 2023 17:15:36 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= In-Reply-To: <57A3624B-DAAA-4D4F-9EA0-7F4BEED7C9C4@petsc.dev> Message-ID: <147656-63f63f80-2b-54955c80@45250414> Hi Barry, after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG). When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? Thank you, Paul Gro?e-Bley On Monday, February 06, 2023 20:05 CET, Barry Smith wrote: ????It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you.?? I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver. ?Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance.???????On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote:?Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Feb 22 11:12:15 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 22 Feb 2023 12:12:15 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: References: <1381a1-63e13c80-cf-417f7600@172794512> <57A3624B-DAAA-4D4F-9EA0-7F4BEED7C9C4@petsc.dev> Message-ID: On Tue, Feb 7, 2023 at 6:40 AM Matthew Knepley wrote: > On Tue, Feb 7, 2023 at 6:23 AM Mark Adams wrote: > >> I do one complete solve to get everything setup, to be safe. >> >> src/ts/tutorials/ex13.c does this and runs multiple solves, if you like >> but one solve is probably fine. >> > > I think that is SNES ex13 > Yes, it is src/snes/tests/ex13.c > > Matt > > >> This was designed as a benchmark and is nice because it can do any order >> FE solve of Poisson (uses DM/PetscFE, slow). >> src/ksp/ksp/tutorials/ex56.c is old school, hardwired for elasticity but >> is simpler and the setup is faster if you are doing large problems per MPI >> process. >> >> Mark >> >> On Mon, Feb 6, 2023 at 2:06 PM Barry Smith wrote: >> >>> >>> It should not crash, take a look at the test cases at the bottom of the >>> file. You are likely correct if the code, unfortunately, does use >>> DMCreateMatrix() it will not work out of the box for geometric multigrid. >>> So it might be the wrong example for you. >>> >>> I don't know what you mean about clever. If you simply set the >>> solution to zero at the beginning of the loop then it will just do the same >>> solve multiple times. The setup should not do much of anything after the >>> first solver. Thought usually solves are big enough that one need not run >>> solves multiple times to get a good understanding of their performance. >>> >>> >>> >>> >>> >>> >>> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley < >>> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >>> >>> Hi Barry, >>> >>> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting >>> point, thank you! Sadly I get a segfault when executing that example with >>> PCMG and more than one level, i.e. with the minimal args: >>> >>> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >>> =========================================== >>> Test: KSP performance - Poisson >>> Input matrix: 27-pt finite difference stencil >>> -n 100 >>> DoFs = 1000000 >>> Number of nonzeros = 26463592 >>> >>> Step1 - creating Vecs and Mat... >>> Step2a - running PCSetUp()... >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >>> probably memory access out of range >>> [0]PETSC ERROR: Try option -start_in_debugger or >>> -on_error_attach_debugger >>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and >>> https://petsc.org/release/faq/ >>> [0]PETSC ERROR: or try >>> https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA >>> systems to find memory corruption errors >>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >>> and run >>> [0]PETSC ERROR: to get more information on the crash. >>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is >>> causing the crash. >>> >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>> with errorcode 59. >>> >>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>> You may or may not see output from other processes, depending on >>> exactly when Open MPI kills them. >>> >>> -------------------------------------------------------------------------- >>> >>> As the matrix is not created using DMDACreate3d I expected it to fail >>> due to the missing geometric information, but I expected it to fail more >>> gracefully than with a segfault. >>> I will try to combine bench_kspsolve.c with ex45.c to get easy MG >>> preconditioning, especially since I am interested in the 7pt stencil for >>> now. >>> >>> Concerning my benchmarking loop from before: Is it generally discouraged >>> to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the >>> work when doing the same solve multiple times or are the solves not >>> iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to >>> keep the runtime short? >>> >>> Thanks, >>> Paul >>> >>> On Monday, February 06, 2023 17:04 CET, Barry Smith >>> wrote: >>> >>> >>> >>> >>> >>> Paul, >>> >>> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to >>> be used for simple benchmarking. >>> >>> You can use VecCudaGetArray() to access the GPU memory of the vector >>> and then call a CUDA kernel to compute the right hand side vector directly >>> on the GPU. >>> >>> Barry >>> >>> >>> >>> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley < >>> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >>> >>> Hi, >>> >>> I want to compare different implementations of multigrid solvers for >>> Nvidia GPUs using the poisson problem (starting from ksp tutorial example >>> ex45.c). >>> Therefore I am trying to get runtime results comparable to hpgmg-cuda >>> >>> (finite-volume), i.e. using multiple warmup and measurement solves and >>> avoiding measuring setup time. >>> For now I am using -log_view with added stages: >>> >>> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >>> for (int i = 0; i < BENCH_SOLVES; i++) { >>> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, >>> NULL)); // reset x >>> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >>> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that >>> everything is done >>> >>> PetscLogStagePush(solve_bench_stage); >>> PetscCall(KSPSolve(ksp, NULL, NULL)); >>> PetscLogStagePop(); >>> } >>> >>> This snippet is preceded by a similar loop for warmup. >>> >>> When profiling this using Nsight Systems, I see that the very first >>> solve is much slower which mostly correspods to H2D (host to device) copies >>> and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the >>> docs >>> , >>> but probably insignificant in this case). The following solves have some >>> overhead at the start from a H2D copy of a vector (the RHS I guess, as the >>> copy is preceeded by a matrix-vector product) in the first MatResidual call >>> (callchain: >>> KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My >>> interpretation of the profiling results (i.e. cuBLAS calls) is that that >>> vector is overwritten with the residual in Daxpy and therefore has to be >>> copied again for the next iteration. >>> >>> Is there an elegant way of avoiding that H2D copy? I have seen some >>> examples on constructing matrices directly on the GPU, but nothing about >>> vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At >>> the moment I am using jacobi as smoother, but I would like to have a CUDA >>> implementation of SOR instead. Is there a good way of achieving that, e.g. >>> using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as >>> smoother in PCMG? Or is the overhead from constantly switching between >>> PETSc and hypre too big? >>> >>> Thanks, >>> Paul >>> >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Feb 22 11:24:07 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 22 Feb 2023 12:24:07 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <147656-63f63f80-2b-54955c80@45250414> References: <57A3624B-DAAA-4D4F-9EA0-7F4BEED7C9C4@petsc.dev> <147656-63f63f80-2b-54955c80@45250414> Message-ID: On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley < paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > Hi Barry, > > after using VecCUDAGetArray to initialize the RHS, that kernel still gets > called as part of KSPSolve instead of KSPSetup, but its runtime is way less > significant than the cudaMemcpy before, so I guess I will leave it like > this. Other than that I kept the code like in my first message in this > thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). > > The profiling results for PCMG and PCAMG look as I would expect them to > look, i.e. one can nicely see the GPU load/kernel runtimes going down and > up again for each V-cycle. > > I was wondering about -pc_mg_multiplicative_cycles as it does not seem to > make any difference. I would have expected to be able to increase the > number of V-cycles per KSP iteration, but I keep seeing just a single > V-cycle when changing the option (using PCMG). > How are you seeing this? You might try -log_trace to see if you get two V cycles. > > When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess > between bench iterations to reset the solution vector does not seem to work > as the residual keeps shrinking. Is this a bug? Any advice for working > around this? > > Looking at the doc https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with KSPSetComputeRHS. In src/snes/tests/ex13.c I just zero out the solution vector. > The profile for BoomerAMG also doesn't really show the V-cycle behavior of > the other implementations. Most of the runtime seems to go into calls to > cusparseDcsrsv which might happen at the different levels, but the runtime > of these kernels doesn't show the V-cycle pattern. According to the output > with -pc_hypre_boomeramg_print_statistics it is doing the right thing > though, so I guess it is alright (and if not, this is probably the wrong > place to discuss it). > When using PCAMGX, I see two PCApply (each showing a nice V-cycle > behavior) calls in KSPSolve (three for the very first KSPSolve) while > expecting just one. Each KSPSolve should do a single preconditioned > Richardson iteration. Why is the preconditioner applied multiple times here? > > Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much. Note some KSP methods apply to the PC before the iteration. Mark > Thank you, > Paul Gro?e-Bley > > > On Monday, February 06, 2023 20:05 CET, Barry Smith > wrote: > > > > > > It should not crash, take a look at the test cases at the bottom of the > file. You are likely correct if the code, unfortunately, does use > DMCreateMatrix() it will not work out of the box for geometric multigrid. > So it might be the wrong example for you. > > I don't know what you mean about clever. If you simply set the solution > to zero at the beginning of the loop then it will just do the same solve > multiple times. The setup should not do much of anything after the first > solver. Thought usually solves are big enough that one need not run solves > multiple times to get a good understanding of their performance. > > > > > > > > On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi Barry, > > src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting > point, thank you! Sadly I get a segfault when executing that example with > PCMG and more than one level, i.e. with the minimal args: > > $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 > =========================================== > Test: KSP performance - Poisson > Input matrix: 27-pt finite difference stencil > -n 100 > DoFs = 1000000 > Number of nonzeros = 26463592 > > Step1 - creating Vecs and Mat... > Step2a - running PCSetUp()... > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [0]PETSC ERROR: or try > https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA > systems to find memory corruption errors > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is > causing the crash. > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 59. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > > As the matrix is not created using DMDACreate3d I expected it to fail due > to the missing geometric information, but I expected it to fail more > gracefully than with a segfault. > I will try to combine bench_kspsolve.c with ex45.c to get easy MG > preconditioning, especially since I am interested in the 7pt stencil for > now. > > Concerning my benchmarking loop from before: Is it generally discouraged > to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the > work when doing the same solve multiple times or are the solves not > iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to > keep the runtime short? > > Thanks, > Paul > > On Monday, February 06, 2023 17:04 CET, Barry Smith > wrote: > > > > > > Paul, > > I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to > be used for simple benchmarking. > > You can use VecCudaGetArray() to access the GPU memory of the vector > and then call a CUDA kernel to compute the right hand side vector directly > on the GPU. > > Barry > > > > On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi, > > I want to compare different implementations of multigrid solvers for > Nvidia GPUs using the poisson problem (starting from ksp tutorial example > ex45.c). > Therefore I am trying to get runtime results comparable to hpgmg-cuda > > (finite-volume), i.e. using multiple warmup and measurement solves and > avoiding measuring setup time. > For now I am using -log_view with added stages: > > PetscLogStageRegister("Solve Bench", &solve_bench_stage); > for (int i = 0; i < BENCH_SOLVES; i++) { > PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); > // reset x > PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve > PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that > everything is done > > PetscLogStagePush(solve_bench_stage); > PetscCall(KSPSolve(ksp, NULL, NULL)); > PetscLogStagePop(); > } > > This snippet is preceded by a similar loop for warmup. > > When profiling this using Nsight Systems, I see that the very first solve > is much slower which mostly correspods to H2D (host to device) copies and > e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs > , > but probably insignificant in this case). The following solves have some > overhead at the start from a H2D copy of a vector (the RHS I guess, as the > copy is preceeded by a matrix-vector product) in the first MatResidual call > (callchain: > KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My > interpretation of the profiling results (i.e. cuBLAS calls) is that that > vector is overwritten with the residual in Daxpy and therefore has to be > copied again for the next iteration. > > Is there an elegant way of avoiding that H2D copy? I have seen some > examples on constructing matrices directly on the GPU, but nothing about > vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At > the moment I am using jacobi as smoother, but I would like to have a CUDA > implementation of SOR instead. Is there a good way of achieving that, e.g. > using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as > smoother in PCMG? Or is the overhead from constantly switching between > PETSc and hypre too big? > > Thanks, > Paul > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbarnafi at cmm.uchile.cl Wed Feb 22 11:54:29 2023 From: nbarnafi at cmm.uchile.cl (Nicolas Barnafi) Date: Wed, 22 Feb 2023 18:54:29 +0100 Subject: [petsc-users] Problem setting Fieldsplit fields In-Reply-To: References: <15fd4b93-6e77-b9ef-95f6-1f3c1ed45162@cmm.uchile.cl> Message-ID: <33456f00-6519-fb04-b7ca-6f54d5eb5ee5@cmm.uchile.cl> Hi Matt, Sorry for the late answer, it was holiday time. > Just to clarify, if you call SetIS() 3 times, and then give > > ? -pc_fieldsplit_0_fields 0,2 > > then we should reduce the number of fields to two by calling > ISConcatenate() on the first and last ISes? Exactly > I think this should not be hard. It will work exactly as it does on > the DM case, except the ISes will come from > the PC, not the DM. One complication is that you will have to hold the > new ISes until the end, and then set them. > > ? ?Thanks, > > ? ? ?Matt Nice, then it is exactly what I want. I will work on it, and create a PR when things are starting to fit in. Best, NB -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Feb 22 12:10:32 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 22 Feb 2023 19:10:32 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= In-Reply-To: Message-ID: <31e670-63f65a80-167-1328be20@181111108> Hi Mark, I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems. I run only a single MPI rank with a single GPU, so profiling is straighforward. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: ???On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley wrote:Hi Barry, after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG).?How are you seeing?this??You might try -log_trace to see if you get two V cycles.? When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? ??Looking at the doc?https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with??KSPSetComputeRHS.?In src/snes/tests/ex13.c I just zero out the solution vector.??The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? ??Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much.Note some KSP methods apply to the PC before the iteration.?Mark??Thank you, Paul Gro?e-Bley On Monday, February 06, 2023 20:05 CET, Barry Smith wrote: ????It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you.?? I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver.? Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance.???????On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote:?Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Feb 22 12:46:19 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 22 Feb 2023 13:46:19 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <31e670-63f65a80-167-1328be20@181111108> References: <31e670-63f65a80-167-1328be20@181111108> Message-ID: OK, Nsight Systems is a good way to see what is going on. So all three of your solvers are not traversing the MG hierching with the correct logic. I don't know about hypre but PCMG and AMGx are pretty simple and AMGx dives into the AMGx library directly from out interface. Some things to try: * Use -options_left to make sure your options are being used (eg, spelling mistakes) * Use -ksp_view to see a human readable list of your solver parameters. * Use -log_trace to see if the correct methods are called. - PCMG calls PCMGMCycle_Private for each of the cycle in code like: for (i = 0; i < mg->cyclesperpcapply; i++) PetscCall(PCMGMCycle_Private(pc, mglevels + levels - 1, transpose, matapp, NULL)); - AMGx is called PCApply_AMGX and then it dives into the library. See where these three calls to AMGx are called from. Mark On Wed, Feb 22, 2023 at 1:10 PM Paul Grosse-Bley < paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > Hi Mark, > > I use Nvidia Nsight Systems with --trace > cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX > markers that come with -log_view. I.e. I get a nice view of all cuBLAS and > cuSPARSE calls (in addition to the actual kernels which are not always easy > to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more > detailed NVTX markers. > > The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear > because kernel runtimes on coarser levels are much shorter. At the coarsest > level, there normally isn't even enough work for the GPU (Nvidia A100) to > be fully occupied which is also visible in Nsight Systems. > > I run only a single MPI rank with a single GPU, so profiling is > straighforward. > > Best, > Paul Gro?e-Bley > > On Wednesday, February 22, 2023 18:24 CET, Mark Adams > wrote: > > > > > On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > >> Hi Barry, >> >> after using VecCUDAGetArray to initialize the RHS, that kernel still gets >> called as part of KSPSolve instead of KSPSetup, but its runtime is way less >> significant than the cudaMemcpy before, so I guess I will leave it like >> this. Other than that I kept the code like in my first message in this >> thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). >> >> The profiling results for PCMG and PCAMG look as I would expect them to >> look, i.e. one can nicely see the GPU load/kernel runtimes going down and >> up again for each V-cycle. >> >> I was wondering about -pc_mg_multiplicative_cycles as it does not seem to >> make any difference. I would have expected to be able to increase the >> number of V-cycles per KSP iteration, but I keep seeing just a single >> V-cycle when changing the option (using PCMG). > > > How are you seeing this? > You might try -log_trace to see if you get two V cycles. > > >> >> When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess >> between bench iterations to reset the solution vector does not seem to work >> as the residual keeps shrinking. Is this a bug? Any advice for working >> around this? >> > > > Looking at the doc > https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ > you use this with KSPSetComputeRHS. > > In src/snes/tests/ex13.c I just zero out the solution vector. > > >> The profile for BoomerAMG also doesn't really show the V-cycle behavior >> of the other implementations. Most of the runtime seems to go into calls to >> cusparseDcsrsv which might happen at the different levels, but the runtime >> of these kernels doesn't show the V-cycle pattern. According to the output >> with -pc_hypre_boomeramg_print_statistics it is doing the right thing >> though, so I guess it is alright (and if not, this is probably the wrong >> place to discuss it). > > >> When using PCAMGX, I see two PCApply (each showing a nice V-cycle >> behavior) calls in KSPSolve (three for the very first KSPSolve) while >> expecting just one. Each KSPSolve should do a single preconditioned >> Richardson iteration. Why is the preconditioner applied multiple times here? >> > > > Again, not sure what "see" is, but PCAMGX is pretty new and has not been > used much. > Note some KSP methods apply to the PC before the iteration. > > Mark > > >> Thank you, >> Paul Gro?e-Bley >> >> >> On Monday, February 06, 2023 20:05 CET, Barry Smith >> wrote: >> >> >> >> >> >> It should not crash, take a look at the test cases at the bottom of the >> file. You are likely correct if the code, unfortunately, does use >> DMCreateMatrix() it will not work out of the box for geometric multigrid. >> So it might be the wrong example for you. >> >> I don't know what you mean about clever. If you simply set the solution >> to zero at the beginning of the loop then it will just do the same solve >> multiple times. The setup should not do much of anything after the first >> solver. Thought usually solves are big enough that one need not run solves >> multiple times to get a good understanding of their performance. >> >> >> >> >> >> >> >> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley < >> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >> >> Hi Barry, >> >> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting >> point, thank you! Sadly I get a segfault when executing that example with >> PCMG and more than one level, i.e. with the minimal args: >> >> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >> =========================================== >> Test: KSP performance - Poisson >> Input matrix: 27-pt finite difference stencil >> -n 100 >> DoFs = 1000000 >> Number of nonzeros = 26463592 >> >> Step1 - creating Vecs and Mat... >> Step2a - running PCSetUp()... >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and >> https://petsc.org/release/faq/ >> [0]PETSC ERROR: or try >> https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA >> systems to find memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is >> causing the crash. >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >> with errorcode 59. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> >> As the matrix is not created using DMDACreate3d I expected it to fail due >> to the missing geometric information, but I expected it to fail more >> gracefully than with a segfault. >> I will try to combine bench_kspsolve.c with ex45.c to get easy MG >> preconditioning, especially since I am interested in the 7pt stencil for >> now. >> >> Concerning my benchmarking loop from before: Is it generally discouraged >> to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the >> work when doing the same solve multiple times or are the solves not >> iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to >> keep the runtime short? >> >> Thanks, >> Paul >> >> On Monday, February 06, 2023 17:04 CET, Barry Smith >> wrote: >> >> >> >> >> >> Paul, >> >> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to >> be used for simple benchmarking. >> >> You can use VecCudaGetArray() to access the GPU memory of the vector >> and then call a CUDA kernel to compute the right hand side vector directly >> on the GPU. >> >> Barry >> >> >> >> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley < >> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >> >> Hi, >> >> I want to compare different implementations of multigrid solvers for >> Nvidia GPUs using the poisson problem (starting from ksp tutorial example >> ex45.c). >> Therefore I am trying to get runtime results comparable to hpgmg-cuda >> >> (finite-volume), i.e. using multiple warmup and measurement solves and >> avoiding measuring setup time. >> For now I am using -log_view with added stages: >> >> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >> for (int i = 0; i < BENCH_SOLVES; i++) { >> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); >> // reset x >> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that >> everything is done >> >> PetscLogStagePush(solve_bench_stage); >> PetscCall(KSPSolve(ksp, NULL, NULL)); >> PetscLogStagePop(); >> } >> >> This snippet is preceded by a similar loop for warmup. >> >> When profiling this using Nsight Systems, I see that the very first solve >> is much slower which mostly correspods to H2D (host to device) copies and >> e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs >> , >> but probably insignificant in this case). The following solves have some >> overhead at the start from a H2D copy of a vector (the RHS I guess, as the >> copy is preceeded by a matrix-vector product) in the first MatResidual call >> (callchain: >> KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My >> interpretation of the profiling results (i.e. cuBLAS calls) is that that >> vector is overwritten with the residual in Daxpy and therefore has to be >> copied again for the next iteration. >> >> Is there an elegant way of avoiding that H2D copy? I have seen some >> examples on constructing matrices directly on the GPU, but nothing about >> vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At >> the moment I am using jacobi as smoother, but I would like to have a CUDA >> implementation of SOR instead. Is there a good way of achieving that, e.g. >> using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as >> smoother in PCMG? Or is the overhead from constantly switching between >> PETSc and hypre too big? >> >> Thanks, >> Paul >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 22 13:15:54 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 22 Feb 2023 14:15:54 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <31e670-63f65a80-167-1328be20@181111108> References: <31e670-63f65a80-167-1328be20@181111108> Message-ID: <75069434-EECF-48FD-B84C-D89854A5CF8C@petsc.dev> > On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley wrote: > > Hi Mark, > > I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. > > The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems. Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most definitely it changes the run. I am not understanding why it would not work for you. If you use and don't use the option are the exact same counts listed for all the events in the -log_view ? > > > I run only a single MPI rank with a single GPU, so profiling is straighforward. > > Best, > Paul Gro?e-Bley > > On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: > >> >> >> >> On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley > wrote: >>> Hi Barry, >>> >>> after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). >>> >>> The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. >>> >>> I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG). >> >> How are you seeing this? >> You might try -log_trace to see if you get two V cycles. >> >>> >>> When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? >>> >> >> Looking at the doc https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with KSPSetComputeRHS. >> >> In src/snes/tests/ex13.c I just zero out the solution vector. >> >>> The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). >>> >>> When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? >>> >> >> Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much. >> Note some KSP methods apply to the PC before the iteration. >> >> Mark >> >>> Thank you, >>> Paul Gro?e-Bley >>> >>> >>> On Monday, February 06, 2023 20:05 CET, Barry Smith > wrote: >>> >>>> >>>> >>> >>> It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you. >>> >>> I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver. Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance. >>> >>> >>> >>> >>> >>> >>>> >>>> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley > wrote: >>>> >>>> Hi Barry, >>>> >>>> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: >>>> >>>> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >>>> =========================================== >>>> Test: KSP performance - Poisson >>>> Input matrix: 27-pt finite difference stencil >>>> -n 100 >>>> DoFs = 1000000 >>>> Number of nonzeros = 26463592 >>>> >>>> Step1 - creating Vecs and Mat... >>>> Step2a - running PCSetUp()... >>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>> [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors >>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>> [0]PETSC ERROR: to get more information on the crash. >>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>> -------------------------------------------------------------------------- >>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>> with errorcode 59. >>>> >>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>> You may or may not see output from other processes, depending on >>>> exactly when Open MPI kills them. >>>> -------------------------------------------------------------------------- >>>> >>>> As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. >>>> I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. >>>> >>>> Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? >>>> >>>> Thanks, >>>> Paul >>>> >>>> On Monday, February 06, 2023 17:04 CET, Barry Smith > wrote: >>>> >>>>> >>>>> >>>> >>>> Paul, >>>> >>>> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking. >>>> >>>> You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU. >>>> >>>> Barry >>>> >>>> >>>>> >>>>> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley > wrote: >>>>> >>>>> Hi, >>>>> >>>>> I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). >>>>> Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. >>>>> For now I am using -log_view with added stages: >>>>> >>>>> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >>>>> for (int i = 0; i < BENCH_SOLVES; i++) { >>>>> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x >>>>> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >>>>> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done >>>>> >>>>> PetscLogStagePush(solve_bench_stage); >>>>> PetscCall(KSPSolve(ksp, NULL, NULL)); >>>>> PetscLogStagePop(); >>>>> } >>>>> >>>>> This snippet is preceded by a similar loop for warmup. >>>>> >>>>> When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs , but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. >>>>> >>>>> Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? >>>>> >>>>> Thanks, >>>>> Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Feb 22 13:19:11 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 22 Feb 2023 20:19:11 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= In-Reply-To: Message-ID: <3272be-63f66a80-1bd-59383a80@204691431> Hi again, after checking with -ksp_monitor for PCMG, it seems my assumption that I could reset the solution by calling KSPSetComputeInitialGuess and then KSPSetupwas generally wrong and BoomerAMG was just the only preconditioner that cleverly stops doing work when the residual is already converged (which then caused me to find the shrinking residual and to think that it was doing something different to the other methods). So, how can I get KSP to use the function given through KSPSetComputeInitialGuess to reset the solution vector (without calling KSPReset which would add a lot of overhead, I assume)? Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 19:46 CET, Mark Adams wrote: ?OK, Nsight Systems is a good way to see what?is going on.?So all three of your solvers are not traversing the MG hierching with the correct logic.I don't know about hypre but PCMG and AMGx are pretty?simple and AMGx dives into the AMGx library directly from out interface.Some things to try:* Use -options_left to make sure your options are being used (eg, spelling mistakes)* Use -ksp_view to see a human readable list of your solver parameters.* Use -log_trace to see if the correct methods are called.?- PCMG calls?PCMGMCycle_Private for each of the cycle in code like:? ?for (i = 0; i < mg->cyclesperpcapply; i++) PetscCall(PCMGMCycle_Private(pc, mglevels + levels - 1, transpose, matapp, NULL));- AMGx is called?PCApply_AMGX and then it dives into the library. See where these three calls to AMGx are called from.?Mark?On Wed, Feb 22, 2023 at 1:10 PM Paul Grosse-Bley wrote:Hi Mark, I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems. I run only a single MPI rank with a single GPU, so profiling is straighforward. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: ???On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley wrote:Hi Barry, after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG).?How are you seeing?this??You might try -log_trace to see if you get two V cycles.? When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? ??Looking at the doc?https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with??KSPSetComputeRHS.?In src/snes/tests/ex13.c I just zero out the solution vector.??The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? ??Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much.Note some KSP methods apply to the PC before the iteration.?Mark??Thank you, Paul Gro?e-Bley On Monday, February 06, 2023 20:05 CET, Barry Smith wrote: ????It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you.?? I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver.? Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance.???????On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote:?Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 22 13:46:44 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 22 Feb 2023 14:46:44 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <3272be-63f66a80-1bd-59383a80@204691431> References: <3272be-63f66a80-1bd-59383a80@204691431> Message-ID: > On Feb 22, 2023, at 2:19 PM, Paul Grosse-Bley wrote: > > Hi again, > > after checking with -ksp_monitor for PCMG, it seems my assumption that I could reset the solution by calling KSPSetComputeInitialGuess and then KSPSetupwas generally wrong and BoomerAMG was just the only preconditioner that cleverly stops doing work when the residual is already converged (which then caused me to find the shrinking residual and to think that it was doing something different to the other methods). Depending on your example the KSPSetComputeInitialGuess usage might not work. I suggest not using it for your benchmarking. But if you just call KSPSolve() multiple times it will zero the solution at the beginning of each KSPSolve() (unless you use KSPSetInitialGuessNonzero()). So if you want to run multiple solves for testing you should not need to do anything. This will be true for any use of KSPSolve() whether the PC is from PETSc, hypre or NVIDIA. > > So, how can I get KSP to use the function given through KSPSetComputeInitialGuess to reset the solution vector (without calling KSPReset which would add a lot of overhead, I assume)? > > Best, > Paul Gro?e-Bley > > > On Wednesday, February 22, 2023 19:46 CET, Mark Adams wrote: > >> >> OK, Nsight Systems is a good way to see what is going on. >> >> So all three of your solvers are not traversing the MG hierching with the correct logic. >> I don't know about hypre but PCMG and AMGx are pretty simple and AMGx dives into the AMGx library directly from out interface. >> Some things to try: >> * Use -options_left to make sure your options are being used (eg, spelling mistakes) >> * Use -ksp_view to see a human readable list of your solver parameters. >> * Use -log_trace to see if the correct methods are called. >> - PCMG calls PCMGMCycle_Private for each of the cycle in code like: >> for (i = 0; i < mg->cyclesperpcapply; i++) PetscCall(PCMGMCycle_Private(pc, mglevels + levels - 1, transpose, matapp, NULL)); >> - AMGx is called PCApply_AMGX and then it dives into the library. See where these three calls to AMGx are called from. >> >> Mark >> >> On Wed, Feb 22, 2023 at 1:10 PM Paul Grosse-Bley > wrote: >>> Hi Mark, >>> >>> I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. >>> >>> The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems. >>> >>> I run only a single MPI rank with a single GPU, so profiling is straighforward. >>> >>> Best, >>> Paul Gro?e-Bley >>> >>> On Wednesday, February 22, 2023 18:24 CET, Mark Adams > wrote: >>> >>>> >>>> >>>> >>>> On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley > wrote: >>>>> Hi Barry, >>>>> >>>>> after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). >>>>> >>>>> The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. >>>>> >>>>> I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG). >>>> >>>> How are you seeing this? >>>> You might try -log_trace to see if you get two V cycles. >>>> >>>>> >>>>> When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? >>>>> >>>> >>>> Looking at the doc https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with KSPSetComputeRHS. >>>> >>>> In src/snes/tests/ex13.c I just zero out the solution vector. >>>> >>>>> The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). >>>>> >>>>> When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? >>>>> >>>> >>>> Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much. >>>> Note some KSP methods apply to the PC before the iteration. >>>> >>>> Mark >>>> >>>>> Thank you, >>>>> Paul Gro?e-Bley >>>>> >>>>> >>>>> On Monday, February 06, 2023 20:05 CET, Barry Smith > wrote: >>>>> >>>>>> >>>>>> >>>>> >>>>> It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you. >>>>> >>>>> I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver. Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley > wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: >>>>>> >>>>>> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >>>>>> =========================================== >>>>>> Test: KSP performance - Poisson >>>>>> Input matrix: 27-pt finite difference stencil >>>>>> -n 100 >>>>>> DoFs = 1000000 >>>>>> Number of nonzeros = 26463592 >>>>>> >>>>>> Step1 - creating Vecs and Mat... >>>>>> Step2a - running PCSetUp()... >>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>>> [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors >>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>>> [0]PETSC ERROR: to get more information on the crash. >>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>>> -------------------------------------------------------------------------- >>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>> with errorcode 59. >>>>>> >>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>> You may or may not see output from other processes, depending on >>>>>> exactly when Open MPI kills them. >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. >>>>>> I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. >>>>>> >>>>>> Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? >>>>>> >>>>>> Thanks, >>>>>> Paul >>>>>> >>>>>> On Monday, February 06, 2023 17:04 CET, Barry Smith > wrote: >>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> Paul, >>>>>> >>>>>> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking. >>>>>> >>>>>> You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>>> >>>>>>> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley > wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). >>>>>>> Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. >>>>>>> For now I am using -log_view with added stages: >>>>>>> >>>>>>> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >>>>>>> for (int i = 0; i < BENCH_SOLVES; i++) { >>>>>>> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x >>>>>>> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >>>>>>> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done >>>>>>> >>>>>>> PetscLogStagePush(solve_bench_stage); >>>>>>> PetscCall(KSPSolve(ksp, NULL, NULL)); >>>>>>> PetscLogStagePop(); >>>>>>> } >>>>>>> >>>>>>> This snippet is preceded by a similar loop for warmup. >>>>>>> >>>>>>> When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs , but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. >>>>>>> >>>>>>> Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? >>>>>>> >>>>>>> Thanks, >>>>>>> Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Feb 22 13:56:57 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 22 Feb 2023 20:56:57 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= Message-ID: <13d4cf-63f67380-bb-44586380@162865271> Hi Barry, I think most of my "weird" observations came from the fact that I looked at iterations of KSPSolve where the residual was already converged. PCMG and PCGAMG do one V-cycle before even taking a look at the residual and then independent of pc_mg_multiplicative_cycles stop if it is converged. Looking at iterations that are not converged with PCMG, pc_mg_multiplicative_cycles works fine. At these iterations I also see the multiple calls to PCApply in a single KSPSolve iteration which were throwing me off with PCAMGX before. The reason for these multiple applications of the preconditioner (tested for both PCMG and PCAMGX) is that I had set maxits to 1 instead of 0. This could be better documented, I think. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 20:15 CET, Barry Smith wrote: ???On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley wrote:?Hi Mark, I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems.?? Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most definitely it changes the run. I am not understanding why it would not work for you. If you use and don't use the option are the exact same counts listed for all the events in the -log_view ?? I run only a single MPI rank with a single GPU, so profiling is straighforward. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: ???On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley wrote:Hi Barry, after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG).?How are you seeing?this??You might try -log_trace to see if you get two V cycles.? When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? ??Looking at the doc?https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with??KSPSetComputeRHS.?In src/snes/tests/ex13.c I just zero out the solution vector.??The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? ??Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much.Note some KSP methods apply to the PC before the iteration.?Mark??Thank you, Paul Gro?e-Bley On Monday, February 06, 2023 20:05 CET, Barry Smith wrote: ????It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you.?? I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver.? Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance.???????On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote:?Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 22 14:03:31 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 22 Feb 2023 15:03:31 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <13d4cf-63f67380-bb-44586380@162865271> References: <13d4cf-63f67380-bb-44586380@162865271> Message-ID: > On Feb 22, 2023, at 2:56 PM, Paul Grosse-Bley wrote: > > Hi Barry, > > I think most of my "weird" observations came from the fact that I looked at iterations of KSPSolve where the residual was already converged. PCMG and PCGAMG do one V-cycle before even taking a look at the residual and then independent of pc_mg_multiplicative_cycles stop if it is converged. > > Looking at iterations that are not converged with PCMG, pc_mg_multiplicative_cycles works fine. > > At these iterations I also see the multiple calls to PCApply in a single KSPSolve iteration which were throwing me off with PCAMGX before. > > The reason for these multiple applications of the preconditioner (tested for both PCMG and PCAMGX) is that I had set maxits to 1 instead of 0. This could be better documented, I think. I do not understand what you are talking about with regard to maxits of 1 instead of 0. For KSP maxits of 1 means one iteration, 0 is kind of meaningless. The reason that there is a PCApply at the start of the solve is because by default the KSPType is KSPGMRES which by default uses left preconditioner which means the right hand side needs to be scaled by the preconditioner before the KSP process starts. So in this configuration one KSP iteration results in 2 PCApply. You can use -ksp_pc_side right to use right preconditioning and then the number of PCApply will match the number of KSP iterations. > > Best, > Paul Gro?e-Bley > > > > On Wednesday, February 22, 2023 20:15 CET, Barry Smith wrote: > >> >> > >> >> On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley wrote: >> >> Hi Mark, >> >> I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. >> >> The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems. > > Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most definitely it changes the run. I am not understanding why it would not work for you. If you use and don't use the option are the exact same counts listed for all the events in the -log_view ? >> >> >> I run only a single MPI rank with a single GPU, so profiling is straighforward. >> >> Best, >> Paul Gro?e-Bley >> >> On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: >> >>> >>> >>> >>> On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley > wrote: >>>> Hi Barry, >>>> >>>> after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). >>>> >>>> The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. >>>> >>>> I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG). >>> >>> How are you seeing this? >>> You might try -log_trace to see if you get two V cycles. >>> >>>> >>>> When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? >>>> >>> >>> Looking at the doc https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with KSPSetComputeRHS. >>> >>> In src/snes/tests/ex13.c I just zero out the solution vector. >>> >>>> The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). >>>> >>>> When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? >>>> >>> >>> Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much. >>> Note some KSP methods apply to the PC before the iteration. >>> >>> Mark >>> >>>> Thank you, >>>> Paul Gro?e-Bley >>>> >>>> >>>> On Monday, February 06, 2023 20:05 CET, Barry Smith > wrote: >>>> >>>>> >>>>> >>>> >>>> It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you. >>>> >>>> I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver. Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance. >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley > wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: >>>>> >>>>> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >>>>> =========================================== >>>>> Test: KSP performance - Poisson >>>>> Input matrix: 27-pt finite difference stencil >>>>> -n 100 >>>>> DoFs = 1000000 >>>>> Number of nonzeros = 26463592 >>>>> >>>>> Step1 - creating Vecs and Mat... >>>>> Step2a - running PCSetUp()... >>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>> [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors >>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>> [0]PETSC ERROR: to get more information on the crash. >>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>> -------------------------------------------------------------------------- >>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>> with errorcode 59. >>>>> >>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>> You may or may not see output from other processes, depending on >>>>> exactly when Open MPI kills them. >>>>> -------------------------------------------------------------------------- >>>>> >>>>> As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. >>>>> I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. >>>>> >>>>> Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? >>>>> >>>>> Thanks, >>>>> Paul >>>>> >>>>> On Monday, February 06, 2023 17:04 CET, Barry Smith > wrote: >>>>> >>>>>> >>>>>> >>>>> >>>>> Paul, >>>>> >>>>> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking. >>>>> >>>>> You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU. >>>>> >>>>> Barry >>>>> >>>>> >>>>>> >>>>>> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley > wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). >>>>>> Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. >>>>>> For now I am using -log_view with added stages: >>>>>> >>>>>> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >>>>>> for (int i = 0; i < BENCH_SOLVES; i++) { >>>>>> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x >>>>>> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >>>>>> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done >>>>>> >>>>>> PetscLogStagePush(solve_bench_stage); >>>>>> PetscCall(KSPSolve(ksp, NULL, NULL)); >>>>>> PetscLogStagePop(); >>>>>> } >>>>>> >>>>>> This snippet is preceded by a similar loop for warmup. >>>>>> >>>>>> When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs , but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. >>>>>> >>>>>> Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? >>>>>> >>>>>> Thanks, >>>>>> Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Feb 22 14:21:36 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 22 Feb 2023 21:21:36 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= In-Reply-To: Message-ID: <15c14c-63f67980-3-7a07ce00@144192736> Hi Barry, the picture keeps getting clearer. I did not use KSPSetInitialGuessNonzero or the corresponding option, but using KSPSetComputeInitialGuess probably sets it automatically (without telling one in the output of -help). I was also confused by the preonly KSP type not working which is also caused by this. I think ex45 should be changed to use a non-zero initial guess (plus maybe a comment mentioning that one should avoid KSPSetComputeInitialGuess when using the zero initial guess). Thank you for asking the right questions, Paul Gro?e-Bley On Wednesday, February 22, 2023 20:46 CET, Barry Smith wrote: ???On Feb 22, 2023, at 2:19 PM, Paul Grosse-Bley wrote:?Hi again, after checking with -ksp_monitor for PCMG, it seems my assumption that I could reset the solution by calling KSPSetComputeInitialGuess and then KSPSetupwas generally wrong and BoomerAMG was just the only preconditioner that cleverly stops doing work when the residual is already converged (which then caused me to find the shrinking residual and to think that it was doing something different to the other methods).?? ?Depending on your example the KSPSetComputeInitialGuess usage might not work. I suggest not using it for your benchmarking.?? ?But if you just call KSPSolve() multiple times it will zero the solution at the beginning of each KSPSolve() (unless you use KSPSetInitialGuessNonzero()). So if you want to run multiple solves for testing you should not need to do anything. This will be true for any use of KSPSolve() whether the PC is from PETSc, hypre or NVIDIA.??? So, how can I get KSP to use the function given through KSPSetComputeInitialGuess to reset the solution vector (without calling KSPReset which would add a lot of overhead, I assume)? Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 19:46 CET, Mark Adams wrote: ?OK, Nsight Systems is a good way to see what?is going on.?So all three of your solvers are not traversing the MG hierching with the correct logic.I don't know about hypre but PCMG and AMGx are pretty?simple and AMGx dives into the AMGx library directly from out interface.Some things to try:* Use -options_left to make sure your options are being used (eg, spelling mistakes)* Use -ksp_view to see a human readable list of your solver parameters.* Use -log_trace to see if the correct methods are called.?- PCMG calls?PCMGMCycle_Private for each of the cycle in code like:? ?for (i = 0; i < mg->cyclesperpcapply; i++) PetscCall(PCMGMCycle_Private(pc, mglevels + levels - 1, transpose, matapp, NULL));- AMGx is called?PCApply_AMGX and then it dives into the library. See where these three calls to AMGx are called from.?Mark?On Wed, Feb 22, 2023 at 1:10 PM Paul Grosse-Bley wrote:Hi Mark, I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems. I run only a single MPI rank with a single GPU, so profiling is straighforward. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: ???On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley wrote:Hi Barry, after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG).?How are you seeing?this??You might try -log_trace to see if you get two V cycles.? When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? ??Looking at the doc?https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with??KSPSetComputeRHS.?In src/snes/tests/ex13.c I just zero out the solution vector.??The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? ??Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much.Note some KSP methods apply to the PC before the iteration.?Mark??Thank you, Paul Gro?e-Bley On Monday, February 06, 2023 20:05 CET, Barry Smith wrote: ????It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you.?? I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver.? Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance.???????On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote:?Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Feb 22 14:26:32 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 22 Feb 2023 21:26:32 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= In-Reply-To: Message-ID: <31e670-63f67a80-215-1328be20@159715817> I was using the Richardson KSP type which I guess has the same behavior as GMRES here? I got rid of KSPSetComputeInitialGuess completely and will use preonly from now on, where maxits=1 does what I want it to do. Even BoomerAMG now shows the "v-cycle signature" I was looking for, so I think for now all my problems are resolved for now. Thank you very much, Barry and Mark! Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 21:03 CET, Barry Smith wrote: ???On Feb 22, 2023, at 2:56 PM, Paul Grosse-Bley wrote:?Hi Barry, I think most of my "weird" observations came from the fact that I looked at iterations of KSPSolve where the residual was already converged. PCMG and PCGAMG do one V-cycle before even taking a look at the residual and then independent of pc_mg_multiplicative_cycles stop if it is converged. Looking at iterations that are not converged with PCMG, pc_mg_multiplicative_cycles works fine. At these iterations I also see the multiple calls to PCApply in a single KSPSolve iteration which were throwing me off with PCAMGX before. The reason for these multiple applications of the preconditioner (tested for both PCMG and PCAMGX) is that I had set maxits to 1 instead of 0. This could be better documented, I think.?? ?I do not understand what you are talking about with regard to maxits of 1 instead of 0. For KSP maxits of 1 means one iteration, 0 is kind of meaningless.?? ?The reason that there is a PCApply at the start of the solve is because by default the KSPType is KSPGMRES which by default uses left preconditioner which means the right hand side needs to be scaled by the preconditioner before the KSP process starts. So in this configuration one KSP iteration results in 2 PCApply. ?You can use -ksp_pc_side right to use right preconditioning and then the number of PCApply will match the number of KSP iterations. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 20:15 CET, Barry Smith wrote: ???On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley wrote:?Hi Mark, I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems.?? Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most definitely it changes the run. I am not understanding why it would not work for you. If you use and don't use the option are the exact same counts listed for all the events in the -log_view ?? I run only a single MPI rank with a single GPU, so profiling is straighforward. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: ???On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley wrote:Hi Barry, after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG).?How are you seeing?this??You might try -log_trace to see if you get two V cycles.? When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? ??Looking at the doc?https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with??KSPSetComputeRHS.?In src/snes/tests/ex13.c I just zero out the solution vector.??The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? ??Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much.Note some KSP methods apply to the PC before the iteration.?Mark??Thank you, Paul Gro?e-Bley On Monday, February 06, 2023 20:05 CET, Barry Smith wrote: ????It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you.?? I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver.? Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance.???????On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote:?Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Feb 22 15:57:07 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 22 Feb 2023 22:57:07 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= In-Reply-To: <31e670-63f67a80-215-1328be20@159715817> Message-ID: <147656-63f68f80-d1-54955c80@45430132> Hi again, I now found out that 1. preonly ignores -ksp_pc_side right (makes sense, I guess). 2. richardson is incompatible with -ksp_pc_side right. 3. preonly gives less output for -log_view -pc_mg_log than richardson. 4. preonly also ignores -ksp_rtol etc.. 5. preonly causes -log_view to measure incorrect timings for custom stages, i.e. the time for the stage (219us) is significantly shorter than the time for the KSPSolve inside the stage (~40ms). Number 4 will be problematic as I want to benchmark number of V-cycles and runtime for a given rtol. At the same time I want to avoid richardson now because of number 2 and the additional work of scaling the RHS. Is there any good way of just using MG V-cycles as a solver, i.e. without interference from an outer Krylov solver and still iterate until convergence? Or will I just have to accept the additional V-cycle due to the left application of th PC with richardson? I guess I could also manually change -pc_mg_multiplicative_cycles until the residual gets low enough (using preonly), but that seems very inefficient. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 21:26 CET, "Paul Grosse-Bley" wrote: ?I was using the Richardson KSP type which I guess has the same behavior as GMRES here? I got rid of KSPSetComputeInitialGuess completely and will use preonly from now on, where maxits=1 does what I want it to do. Even BoomerAMG now shows the "v-cycle signature" I was looking for, so I think for now all my problems are resolved for now. Thank you very much, Barry and Mark! Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 21:03 CET, Barry Smith wrote: ???On Feb 22, 2023, at 2:56 PM, Paul Grosse-Bley wrote:?Hi Barry, I think most of my "weird" observations came from the fact that I looked at iterations of KSPSolve where the residual was already converged. PCMG and PCGAMG do one V-cycle before even taking a look at the residual and then independent of pc_mg_multiplicative_cycles stop if it is converged. Looking at iterations that are not converged with PCMG, pc_mg_multiplicative_cycles works fine. At these iterations I also see the multiple calls to PCApply in a single KSPSolve iteration which were throwing me off with PCAMGX before. The reason for these multiple applications of the preconditioner (tested for both PCMG and PCAMGX) is that I had set maxits to 1 instead of 0. This could be better documented, I think.?? ?I do not understand what you are talking about with regard to maxits of 1 instead of 0. For KSP maxits of 1 means one iteration, 0 is kind of meaningless.?? ?The reason that there is a PCApply at the start of the solve is because by default the KSPType is KSPGMRES which by default uses left preconditioner which means the right hand side needs to be scaled by the preconditioner before the KSP process starts. So in this configuration one KSP iteration results in 2 PCApply. ?You can use -ksp_pc_side right to use right preconditioning and then the number of PCApply will match the number of KSP iterations. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 20:15 CET, Barry Smith wrote: ???On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley wrote:?Hi Mark, I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems.?? Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most definitely it changes the run. I am not understanding why it would not work for you. If you use and don't use the option are the exact same counts listed for all the events in the -log_view ?? I run only a single MPI rank with a single GPU, so profiling is straighforward. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: ???On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley wrote:Hi Barry, after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG).?How are you seeing?this??You might try -log_trace to see if you get two V cycles.? When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? ??Looking at the doc?https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with??KSPSetComputeRHS.?In src/snes/tests/ex13.c I just zero out the solution vector.??The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? ??Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much.Note some KSP methods apply to the PC before the iteration.?Mark??Thank you, Paul Gro?e-Bley On Monday, February 06, 2023 20:05 CET, Barry Smith wrote: ????It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you.?? I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver.? Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance.???????On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote:?Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 22 16:16:22 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Feb 2023 17:16:22 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <147656-63f68f80-d1-54955c80@45430132> References: <31e670-63f67a80-215-1328be20@159715817> <147656-63f68f80-d1-54955c80@45430132> Message-ID: On Wed, Feb 22, 2023 at 4:57 PM Paul Grosse-Bley < paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > Hi again, > > I now found out that > > 1. preonly ignores -ksp_pc_side right (makes sense, I guess). > 2. richardson is incompatible with -ksp_pc_side right. > 3. preonly gives less output for -log_view -pc_mg_log than richardson. > 4. preonly also ignores -ksp_rtol etc.. > 5. preonly causes -log_view to measure incorrect timings for custom > stages, i.e. the time for the stage (219us) is significantly shorter than > the time for the KSPSolve inside the stage (~40ms). > I think there is a misunderstanding about KSPPREONLY. This applies the preconditioner once and does nothing else. That is why it ignores the tolerances, and viewers, etc. This is normally used with an exact factorization like LU to remove any Krylov overhead. If you want several iterations of the preconditioner, then you want Richardson. This is just steepest descent on the preconditioned operator. In this case, the initial application of a V-cycle is not "extra" in that you use that residual as the descent direction, and it is the one you want. Thanks, Matt > Number 4 will be problematic as I want to benchmark number of V-cycles and > runtime for a given rtol. At the same time I want to avoid richardson now > because of number 2 and the additional work of scaling the RHS. > > Is there any good way of just using MG V-cycles as a solver, i.e. without > interference from an outer Krylov solver and still iterate until > convergence? > Or will I just have to accept the additional V-cycle due to the left > application of th PC with richardson? > > I guess I could also manually change -pc_mg_multiplicative_cycles until > the residual gets low enough (using preonly), but that seems very > inefficient. > > Best, > Paul Gro?e-Bley > > > > On Wednesday, February 22, 2023 21:26 CET, "Paul Grosse-Bley" < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > > I was using the Richardson KSP type which I guess has the same behavior as > GMRES here? I got rid of KSPSetComputeInitialGuess completely and will use > preonly from now on, where maxits=1 does what I want it to do. > > Even BoomerAMG now shows the "v-cycle signature" I was looking for, so I > think for now all my problems are resolved for now. Thank you very much, > Barry and Mark! > > Best, > Paul Gro?e-Bley > > > > On Wednesday, February 22, 2023 21:03 CET, Barry Smith > wrote: > > > > > > > On Feb 22, 2023, at 2:56 PM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi Barry, > > I think most of my "weird" observations came from the fact that I looked > at iterations of KSPSolve where the residual was already converged. PCMG > and PCGAMG do one V-cycle before even taking a look at the residual and > then independent of pc_mg_multiplicative_cycles stop if it is converged. > > Looking at iterations that are not converged with PCMG, > pc_mg_multiplicative_cycles works fine. > > At these iterations I also see the multiple calls to PCApply in a single > KSPSolve iteration which were throwing me off with PCAMGX before. > > The reason for these multiple applications of the preconditioner (tested > for both PCMG and PCAMGX) is that I had set maxits to 1 instead of 0. This > could be better documented, I think. > > > I do not understand what you are talking about with regard to maxits of > 1 instead of 0. For KSP maxits of 1 means one iteration, 0 is kind of > meaningless. > > The reason that there is a PCApply at the start of the solve is because > by default the KSPType is KSPGMRES which by default uses left > preconditioner which means the right hand side needs to be scaled by the > preconditioner before the KSP process starts. So in this configuration one > KSP iteration results in 2 PCApply. You can use -ksp_pc_side right to use > right preconditioning and then the number of PCApply will match the number > of KSP iterations. > > > Best, > Paul Gro?e-Bley > > > > On Wednesday, February 22, 2023 20:15 CET, Barry Smith > wrote: > > > > > > > On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi Mark, > > I use Nvidia Nsight Systems with --trace > cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX > markers that come with -log_view. I.e. I get a nice view of all cuBLAS and > cuSPARSE calls (in addition to the actual kernels which are not always easy > to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more > detailed NVTX markers. > > The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear > because kernel runtimes on coarser levels are much shorter. At the coarsest > level, there normally isn't even enough work for the GPU (Nvidia A100) to > be fully occupied which is also visible in Nsight Systems. > > > Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most > definitely it changes the run. I am not understanding why it would not work > for you. If you use and don't use the option are the exact same counts > listed for all the events in the -log_view ? > > > I run only a single MPI rank with a single GPU, so profiling is > straighforward. > > Best, > Paul Gro?e-Bley > > On Wednesday, February 22, 2023 18:24 CET, Mark Adams > wrote: > > > > > On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > >> Hi Barry, >> >> after using VecCUDAGetArray to initialize the RHS, that kernel still gets >> called as part of KSPSolve instead of KSPSetup, but its runtime is way less >> significant than the cudaMemcpy before, so I guess I will leave it like >> this. Other than that I kept the code like in my first message in this >> thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). >> >> The profiling results for PCMG and PCAMG look as I would expect them to >> look, i.e. one can nicely see the GPU load/kernel runtimes going down and >> up again for each V-cycle. >> >> I was wondering about -pc_mg_multiplicative_cycles as it does not seem to >> make any difference. I would have expected to be able to increase the >> number of V-cycles per KSP iteration, but I keep seeing just a single >> V-cycle when changing the option (using PCMG). > > > How are you seeing this? > You might try -log_trace to see if you get two V cycles. > > >> >> When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess >> between bench iterations to reset the solution vector does not seem to work >> as the residual keeps shrinking. Is this a bug? Any advice for working >> around this? >> > > > Looking at the doc > https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ > you use this with KSPSetComputeRHS. > > In src/snes/tests/ex13.c I just zero out the solution vector. > > >> The profile for BoomerAMG also doesn't really show the V-cycle behavior >> of the other implementations. Most of the runtime seems to go into calls to >> cusparseDcsrsv which might happen at the different levels, but the runtime >> of these kernels doesn't show the V-cycle pattern. According to the output >> with -pc_hypre_boomeramg_print_statistics it is doing the right thing >> though, so I guess it is alright (and if not, this is probably the wrong >> place to discuss it). > > >> When using PCAMGX, I see two PCApply (each showing a nice V-cycle >> behavior) calls in KSPSolve (three for the very first KSPSolve) while >> expecting just one. Each KSPSolve should do a single preconditioned >> Richardson iteration. Why is the preconditioner applied multiple times here? >> > > > Again, not sure what "see" is, but PCAMGX is pretty new and has not been > used much. > Note some KSP methods apply to the PC before the iteration. > > Mark > > >> Thank you, >> Paul Gro?e-Bley >> >> >> On Monday, February 06, 2023 20:05 CET, Barry Smith >> wrote: >> >> >> >> >> >> It should not crash, take a look at the test cases at the bottom of the >> file. You are likely correct if the code, unfortunately, does use >> DMCreateMatrix() it will not work out of the box for geometric multigrid. >> So it might be the wrong example for you. >> >> I don't know what you mean about clever. If you simply set the solution >> to zero at the beginning of the loop then it will just do the same solve >> multiple times. The setup should not do much of anything after the first >> solver. Thought usually solves are big enough that one need not run solves >> multiple times to get a good understanding of their performance. >> >> >> >> >> >> >> >> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley < >> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >> >> Hi Barry, >> >> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting >> point, thank you! Sadly I get a segfault when executing that example with >> PCMG and more than one level, i.e. with the minimal args: >> >> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >> =========================================== >> Test: KSP performance - Poisson >> Input matrix: 27-pt finite difference stencil >> -n 100 >> DoFs = 1000000 >> Number of nonzeros = 26463592 >> >> Step1 - creating Vecs and Mat... >> Step2a - running PCSetUp()... >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and >> https://petsc.org/release/faq/ >> [0]PETSC ERROR: or try >> https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA >> systems to find memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is >> causing the crash. >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >> with errorcode 59. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> >> As the matrix is not created using DMDACreate3d I expected it to fail due >> to the missing geometric information, but I expected it to fail more >> gracefully than with a segfault. >> I will try to combine bench_kspsolve.c with ex45.c to get easy MG >> preconditioning, especially since I am interested in the 7pt stencil for >> now. >> >> Concerning my benchmarking loop from before: Is it generally discouraged >> to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the >> work when doing the same solve multiple times or are the solves not >> iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to >> keep the runtime short? >> >> Thanks, >> Paul >> >> On Monday, February 06, 2023 17:04 CET, Barry Smith >> wrote: >> >> >> >> >> >> Paul, >> >> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to >> be used for simple benchmarking. >> >> You can use VecCudaGetArray() to access the GPU memory of the vector >> and then call a CUDA kernel to compute the right hand side vector directly >> on the GPU. >> >> Barry >> >> >> >> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley < >> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >> >> Hi, >> >> I want to compare different implementations of multigrid solvers for >> Nvidia GPUs using the poisson problem (starting from ksp tutorial example >> ex45.c). >> Therefore I am trying to get runtime results comparable to hpgmg-cuda >> >> (finite-volume), i.e. using multiple warmup and measurement solves and >> avoiding measuring setup time. >> For now I am using -log_view with added stages: >> >> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >> for (int i = 0; i < BENCH_SOLVES; i++) { >> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); >> // reset x >> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that >> everything is done >> >> PetscLogStagePush(solve_bench_stage); >> PetscCall(KSPSolve(ksp, NULL, NULL)); >> PetscLogStagePop(); >> } >> >> This snippet is preceded by a similar loop for warmup. >> >> When profiling this using Nsight Systems, I see that the very first solve >> is much slower which mostly correspods to H2D (host to device) copies and >> e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs >> , >> but probably insignificant in this case). The following solves have some >> overhead at the start from a H2D copy of a vector (the RHS I guess, as the >> copy is preceeded by a matrix-vector product) in the first MatResidual call >> (callchain: >> KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My >> interpretation of the profiling results (i.e. cuBLAS calls) is that that >> vector is overwritten with the residual in Daxpy and therefore has to be >> copied again for the next iteration. >> >> Is there an elegant way of avoiding that H2D copy? I have seen some >> examples on constructing matrices directly on the GPU, but nothing about >> vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At >> the moment I am using jacobi as smoother, but I would like to have a CUDA >> implementation of SOR instead. Is there a good way of achieving that, e.g. >> using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as >> smoother in PCMG? Or is the overhead from constantly switching between >> PETSc and hypre too big? >> >> Thanks, >> Paul >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 22 16:20:26 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 22 Feb 2023 17:20:26 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <147656-63f68f80-d1-54955c80@45430132> References: <147656-63f68f80-d1-54955c80@45430132> Message-ID: Preonly means exactly one application of the PC so it will never converge by itself unless the PC is a full solver. Note there is a PCApplyRichardson_MG() that gets used automatically with KSPRICHARSON. This does not have an"extra" application of the preconditioner so 2 iterations of Richardson with MG will use 2 applications of the V-cycle. So it is exactly "multigrid as a solver, without a Krylov method", no extra work. So I don't think you need to make any "compromises". Barry > On Feb 22, 2023, at 4:57 PM, Paul Grosse-Bley wrote: > > Hi again, > > I now found out that > > 1. preonly ignores -ksp_pc_side right (makes sense, I guess). > 2. richardson is incompatible with -ksp_pc_side right. > 3. preonly gives less output for -log_view -pc_mg_log than richardson. > 4. preonly also ignores -ksp_rtol etc.. > 5. preonly causes -log_view to measure incorrect timings for custom stages, i.e. the time for the stage (219us) is significantly shorter than the time for the KSPSolve inside the stage (~40ms). > > Number 4 will be problematic as I want to benchmark number of V-cycles and runtime for a given rtol. At the same time I want to avoid richardson now because of number 2 and the additional work of scaling the RHS. > > Is there any good way of just using MG V-cycles as a solver, i.e. without interference from an outer Krylov solver and still iterate until convergence? > Or will I just have to accept the additional V-cycle due to the left application of th PC with richardson? > > I guess I could also manually change -pc_mg_multiplicative_cycles until the residual gets low enough (using preonly), but that seems very inefficient. > > Best, > Paul Gro?e-Bley > > > > On Wednesday, February 22, 2023 21:26 CET, "Paul Grosse-Bley" wrote: > >> >> I was using the Richardson KSP type which I guess has the same behavior as GMRES here? I got rid of KSPSetComputeInitialGuess completely and will use preonly from now on, where maxits=1 does what I want it to do. >> >> Even BoomerAMG now shows the "v-cycle signature" I was looking for, so I think for now all my problems are resolved for now. Thank you very much, Barry and Mark! >> >> Best, >> Paul Gro?e-Bley >> >> >> >> On Wednesday, February 22, 2023 21:03 CET, Barry Smith wrote: >> >>> >>> >> >>> >>> On Feb 22, 2023, at 2:56 PM, Paul Grosse-Bley wrote: >>> >>> Hi Barry, >>> >>> I think most of my "weird" observations came from the fact that I looked at iterations of KSPSolve where the residual was already converged. PCMG and PCGAMG do one V-cycle before even taking a look at the residual and then independent of pc_mg_multiplicative_cycles stop if it is converged. >>> >>> Looking at iterations that are not converged with PCMG, pc_mg_multiplicative_cycles works fine. >>> >>> At these iterations I also see the multiple calls to PCApply in a single KSPSolve iteration which were throwing me off with PCAMGX before. >>> >>> The reason for these multiple applications of the preconditioner (tested for both PCMG and PCAMGX) is that I had set maxits to 1 instead of 0. This could be better documented, I think. >> >> I do not understand what you are talking about with regard to maxits of 1 instead of 0. For KSP maxits of 1 means one iteration, 0 is kind of meaningless. >> >> The reason that there is a PCApply at the start of the solve is because by default the KSPType is KSPGMRES which by default uses left preconditioner which means the right hand side needs to be scaled by the preconditioner before the KSP process starts. So in this configuration one KSP iteration results in 2 PCApply. You can use -ksp_pc_side right to use right preconditioning and then the number of PCApply will match the number of KSP iterations. >>> >>> >>> Best, >>> Paul Gro?e-Bley >>> >>> >>> >>> On Wednesday, February 22, 2023 20:15 CET, Barry Smith wrote: >>> >>>> >>>> >>> >>>> >>>> On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley wrote: >>>> >>>> Hi Mark, >>>> >>>> I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. >>>> >>>> The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems. >>> >>> Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most definitely it changes the run. I am not understanding why it would not work for you. If you use and don't use the option are the exact same counts listed for all the events in the -log_view ? >>>> >>>> >>>> I run only a single MPI rank with a single GPU, so profiling is straighforward. >>>> >>>> Best, >>>> Paul Gro?e-Bley >>>> >>>> On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley > wrote: >>>>>> Hi Barry, >>>>>> >>>>>> after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). >>>>>> >>>>>> The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. >>>>>> >>>>>> I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG). >>>>> >>>>> How are you seeing this? >>>>> You might try -log_trace to see if you get two V cycles. >>>>> >>>>>> >>>>>> When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? >>>>>> >>>>> >>>>> Looking at the doc https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with KSPSetComputeRHS. >>>>> >>>>> In src/snes/tests/ex13.c I just zero out the solution vector. >>>>> >>>>>> The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). >>>>>> >>>>>> When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? >>>>>> >>>>> >>>>> Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much. >>>>> Note some KSP methods apply to the PC before the iteration. >>>>> >>>>> Mark >>>>> >>>>>> Thank you, >>>>>> Paul Gro?e-Bley >>>>>> >>>>>> >>>>>> On Monday, February 06, 2023 20:05 CET, Barry Smith > wrote: >>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you. >>>>>> >>>>>> I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver. Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley > wrote: >>>>>>> >>>>>>> Hi Barry, >>>>>>> >>>>>>> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: >>>>>>> >>>>>>> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >>>>>>> =========================================== >>>>>>> Test: KSP performance - Poisson >>>>>>> Input matrix: 27-pt finite difference stencil >>>>>>> -n 100 >>>>>>> DoFs = 1000000 >>>>>>> Number of nonzeros = 26463592 >>>>>>> >>>>>>> Step1 - creating Vecs and Mat... >>>>>>> Step2a - running PCSetUp()... >>>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>>>> [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors >>>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>>>> [0]PETSC ERROR: to get more information on the crash. >>>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>>>> -------------------------------------------------------------------------- >>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>>> with errorcode 59. >>>>>>> >>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>>> You may or may not see output from other processes, depending on >>>>>>> exactly when Open MPI kills them. >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. >>>>>>> I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. >>>>>>> >>>>>>> Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? >>>>>>> >>>>>>> Thanks, >>>>>>> Paul >>>>>>> >>>>>>> On Monday, February 06, 2023 17:04 CET, Barry Smith > wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Paul, >>>>>>> >>>>>>> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking. >>>>>>> >>>>>>> You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley > wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). >>>>>>>> Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. >>>>>>>> For now I am using -log_view with added stages: >>>>>>>> >>>>>>>> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >>>>>>>> for (int i = 0; i < BENCH_SOLVES; i++) { >>>>>>>> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x >>>>>>>> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >>>>>>>> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done >>>>>>>> >>>>>>>> PetscLogStagePush(solve_bench_stage); >>>>>>>> PetscCall(KSPSolve(ksp, NULL, NULL)); >>>>>>>> PetscLogStagePop(); >>>>>>>> } >>>>>>>> >>>>>>>> This snippet is preceded by a similar loop for warmup. >>>>>>>> >>>>>>>> When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs , but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. >>>>>>>> >>>>>>>> Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Wed Feb 22 16:44:42 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Wed, 22 Feb 2023 23:44:42 +0100 Subject: [petsc-users] =?utf-8?q?MG_on_GPU=3A_Benchmarking_and_avoiding_v?= =?utf-8?q?ector_host-=3Edevice_copy?= In-Reply-To: Message-ID: <105a5c-63f69b00-22f-cc49830@105147364> I thought to have observed the number of cycles with?-pc_mg_multiplicative_cycles to be dependent on rtol. But I might have seen this with maxits=0 which would explain my missunderstanding of richardson. I guess PCAMGX does not use this PCApplyRichardson_MG (yet?). Because I still see the multiple PCApplys there with maxits=1 and richardson, while they are gone for PCMG (due to getting rid of? KSPSetComputeInitialGuess?). Best, Paul Grosse-Bley On Wednesday, February 22, 2023 23:20 CET, Barry Smith wrote: ???? ?Preonly means exactly one application of the PC so it will never converge by itself unless the PC is a full solver.?? ?Note there is a?PCApplyRichardson_MG() that gets used automatically with KSPRICHARSON. This does not have an"extra" application of the preconditioner so 2 iterations of Richardson with MG will use 2 applications of the V-cycle. So it is exactly "multigrid as a solver, without a Krylov method", no extra work. So I don't think you need to make any "compromises".??? Barry???On Feb 22, 2023, at 4:57 PM, Paul Grosse-Bley wrote:?Hi again, I now found out that 1. preonly ignores -ksp_pc_side right (makes sense, I guess). 2. richardson is incompatible with -ksp_pc_side right. 3. preonly gives less output for -log_view -pc_mg_log than richardson. 4. preonly also ignores -ksp_rtol etc.. 5. preonly causes -log_view to measure incorrect timings for custom stages, i.e. the time for the stage (219us) is significantly shorter than the time for the KSPSolve inside the stage (~40ms). Number 4 will be problematic as I want to benchmark number of V-cycles and runtime for a given rtol. At the same time I want to avoid richardson now because of number 2 and the additional work of scaling the RHS. Is there any good way of just using MG V-cycles as a solver, i.e. without interference from an outer Krylov solver and still iterate until convergence? Or will I just have to accept the additional V-cycle due to the left application of th PC with richardson? I guess I could also manually change -pc_mg_multiplicative_cycles until the residual gets low enough (using preonly), but that seems very inefficient. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 21:26 CET, "Paul Grosse-Bley" wrote: ?I was using the Richardson KSP type which I guess has the same behavior as GMRES here? I got rid of KSPSetComputeInitialGuess completely and will use preonly from now on, where maxits=1 does what I want it to do. Even BoomerAMG now shows the "v-cycle signature" I was looking for, so I think for now all my problems are resolved for now. Thank you very much, Barry and Mark! Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 21:03 CET, Barry Smith wrote: ???On Feb 22, 2023, at 2:56 PM, Paul Grosse-Bley wrote:?Hi Barry, I think most of my "weird" observations came from the fact that I looked at iterations of KSPSolve where the residual was already converged. PCMG and PCGAMG do one V-cycle before even taking a look at the residual and then independent of pc_mg_multiplicative_cycles stop if it is converged. Looking at iterations that are not converged with PCMG, pc_mg_multiplicative_cycles works fine. At these iterations I also see the multiple calls to PCApply in a single KSPSolve iteration which were throwing me off with PCAMGX before. The reason for these multiple applications of the preconditioner (tested for both PCMG and PCAMGX) is that I had set maxits to 1 instead of 0. This could be better documented, I think.?? ?I do not understand what you are talking about with regard to maxits of 1 instead of 0. For KSP maxits of 1 means one iteration, 0 is kind of meaningless.?? ?The reason that there is a PCApply at the start of the solve is because by default the KSPType is KSPGMRES which by default uses left preconditioner which means the right hand side needs to be scaled by the preconditioner before the KSP process starts. So in this configuration one KSP iteration results in 2 PCApply. ?You can use -ksp_pc_side right to use right preconditioning and then the number of PCApply will match the number of KSP iterations. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 20:15 CET, Barry Smith wrote: ???On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley wrote:?Hi Mark, I use Nvidia Nsight Systems with --trace cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX markers that come with -log_view. I.e. I get a nice view of all cuBLAS and cuSPARSE calls (in addition to the actual kernels which are not always easy to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more detailed NVTX markers. The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear because kernel runtimes on coarser levels are much shorter. At the coarsest level, there normally isn't even enough work for the GPU (Nvidia A100) to be fully occupied which is also visible in Nsight Systems.?? Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most definitely it changes the run. I am not understanding why it would not work for you. If you use and don't use the option are the exact same counts listed for all the events in the -log_view ?? I run only a single MPI rank with a single GPU, so profiling is straighforward. Best, Paul Gro?e-Bley On Wednesday, February 22, 2023 18:24 CET, Mark Adams wrote: ???On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley wrote:Hi Barry, after using VecCUDAGetArray to initialize the RHS, that kernel still gets called as part of KSPSolve instead of KSPSetup, but its runtime is way less significant than the cudaMemcpy before, so I guess I will leave it like this. Other than that I kept the code like in my first message in this thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). The profiling results for PCMG and PCAMG look as I would expect them to look, i.e. one can nicely see the GPU load/kernel runtimes going down and up again for each V-cycle. I was wondering about -pc_mg_multiplicative_cycles as it does not seem to make any difference. I would have expected to be able to increase the number of V-cycles per KSP iteration, but I keep seeing just a single V-cycle when changing the option (using PCMG).?How are you seeing?this??You might try -log_trace to see if you get two V cycles.? When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess between bench iterations to reset the solution vector does not seem to work as the residual keeps shrinking. Is this a bug? Any advice for working around this? ??Looking at the doc?https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ you use this with??KSPSetComputeRHS.?In src/snes/tests/ex13.c I just zero out the solution vector.??The profile for BoomerAMG also doesn't really show the V-cycle behavior of the other implementations. Most of the runtime seems to go into calls to cusparseDcsrsv which might happen at the different levels, but the runtime of these kernels doesn't show the V-cycle pattern. According to the output with -pc_hypre_boomeramg_print_statistics it is doing the right thing though, so I guess it is alright (and if not, this is probably the wrong place to discuss it). When using PCAMGX, I see two PCApply (each showing a nice V-cycle behavior) calls in KSPSolve (three for the very first KSPSolve) while expecting just one. Each KSPSolve should do a single preconditioned Richardson iteration. Why is the preconditioner applied multiple times here? ??Again, not sure what "see" is, but PCAMGX is pretty new and has not been used much.Note some KSP methods apply to the PC before the iteration.?Mark??Thank you, Paul Gro?e-Bley On Monday, February 06, 2023 20:05 CET, Barry Smith wrote: ????It should not crash, take a look at the test cases at the bottom of the file. You are likely correct if the code, unfortunately, does use DMCreateMatrix() it will not work out of the box for geometric multigrid. So it might be the wrong example for you.?? I don't know what you mean about clever. If you simply set the solution to zero at the beginning of the loop then it will just do the same solve multiple times. The setup should not do much of anything after the first solver.? Thought usually solves are big enough that one need not run solves multiple times to get a good understanding of their performance.???????On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley wrote:?Hi Barry, src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting point, thank you! Sadly I get a segfault when executing that example with PCMG and more than one level, i.e. with the minimal args: $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 =========================================== Test: KSP performance - Poisson ?? ?Input matrix: 27-pt finite difference stencil ?? ?-n 100 ?? ?DoFs = 1000000 ?? ?Number of nonzeros = 26463592 Step1? - creating Vecs and Mat... Step2a - running PCSetUp()... [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- As the matrix is not created using DMDACreate3d I expected it to fail due to the missing geometric information, but I expected it to fail more gracefully than with a segfault. I will try to combine bench_kspsolve.c with ex45.c to get easy MG preconditioning, especially since I am interested in the 7pt stencil for now. Concerning my benchmarking loop from before: Is it generally discouraged to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the work when doing the same solve multiple times or are the solves not iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to keep the runtime short? Thanks, Paul On Monday, February 06, 2023 17:04 CET, Barry Smith wrote: ???? Paul,?? ?I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to be used for simple benchmarking.??? ?You can use VecCudaGetArray() to access the GPU memory of the vector and then call a CUDA kernel to compute the right hand side vector directly on the GPU.?? Barry??On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley wrote:?Hi, I want to compare different implementations of multigrid solvers for Nvidia GPUs using the poisson problem (starting from ksp tutorial example ex45.c). Therefore I am trying to get runtime results comparable to hpgmg-cuda (finite-volume), i.e. using multiple warmup and measurement solves and avoiding measuring setup time. For now I am using -log_view with added stages: PetscLogStageRegister("Solve Bench", &solve_bench_stage); ? for (int i = 0; i < BENCH_SOLVES; i++) { ??? PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); // reset x ??? PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve ??? PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that everything is done ??? PetscLogStagePush(solve_bench_stage); ??? PetscCall(KSPSolve(ksp, NULL, NULL)); ??? PetscLogStagePop(); ? } This snippet is preceded by a similar loop for warmup. When profiling this using Nsight Systems, I see that the very first solve is much slower which mostly correspods to H2D (host to device) copies and e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs, but probably insignificant in this case). The following solves have some overhead at the start from a H2D copy of a vector (the RHS I guess, as the copy is preceeded by a matrix-vector product) in the first MatResidual call (callchain: KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My interpretation of the profiling results (i.e. cuBLAS calls) is that that vector is overwritten with the residual in Daxpy and therefore has to be copied again for the next iteration. Is there an elegant way of avoiding that H2D copy? I have seen some examples on constructing matrices directly on the GPU, but nothing about vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At the moment I am using jacobi as smoother, but I would like to have a CUDA implementation of SOR instead. Is there a good way of achieving that, e.g. using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother? as smoother in PCMG? Or is the overhead from constantly switching between PETSc and hypre too big? Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 22 16:47:06 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Feb 2023 17:47:06 -0500 Subject: [petsc-users] MG on GPU: Benchmarking and avoiding vector host->device copy In-Reply-To: <105a5c-63f69b00-22f-cc49830@105147364> References: <105a5c-63f69b00-22f-cc49830@105147364> Message-ID: On Wed, Feb 22, 2023 at 5:45 PM Paul Grosse-Bley < paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > I thought to have observed the number of cycles > with -pc_mg_multiplicative_cycles to be dependent on rtol. But I might have > seen this with maxits=0 which would explain my missunderstanding of > richardson. > > I guess PCAMGX does not use this PCApplyRichardson_MG (yet?). Because I > still see the multiple PCApplys there with maxits=1 and richardson, while > they are gone for PCMG (due to getting rid of KSPSetComputeInitialGuess?). > Yes, that is true. AMGX is a black box, so we cannot look inside the V-cycle. Thanks, Matt > Best, > Paul Grosse-Bley > > > On Wednesday, February 22, 2023 23:20 CET, Barry Smith > wrote: > > > > > > Preonly means exactly one application of the PC so it will never > converge by itself unless the PC is a full solver. > > Note there is a PCApplyRichardson_MG() that gets used automatically > with KSPRICHARSON. This does not have an"extra" application of the > preconditioner so 2 iterations of Richardson with MG will use 2 > applications of the V-cycle. So it is exactly "multigrid as a solver, > without a Krylov method", no extra work. So I don't think you need to make > any "compromises". > > Barry > > > > > On Feb 22, 2023, at 4:57 PM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi again, > > I now found out that > > 1. preonly ignores -ksp_pc_side right (makes sense, I guess). > 2. richardson is incompatible with -ksp_pc_side right. > 3. preonly gives less output for -log_view -pc_mg_log than richardson. > 4. preonly also ignores -ksp_rtol etc.. > 5. preonly causes -log_view to measure incorrect timings for custom > stages, i.e. the time for the stage (219us) is significantly shorter than > the time for the KSPSolve inside the stage (~40ms). > > Number 4 will be problematic as I want to benchmark number of V-cycles and > runtime for a given rtol. At the same time I want to avoid richardson now > because of number 2 and the additional work of scaling the RHS. > > Is there any good way of just using MG V-cycles as a solver, i.e. without > interference from an outer Krylov solver and still iterate until > convergence? > Or will I just have to accept the additional V-cycle due to the left > application of th PC with richardson? > > I guess I could also manually change -pc_mg_multiplicative_cycles until > the residual gets low enough (using preonly), but that seems very > inefficient. > > Best, > Paul Gro?e-Bley > > > > On Wednesday, February 22, 2023 21:26 CET, "Paul Grosse-Bley" < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > > I was using the Richardson KSP type which I guess has the same behavior as > GMRES here? I got rid of KSPSetComputeInitialGuess completely and will use > preonly from now on, where maxits=1 does what I want it to do. > > Even BoomerAMG now shows the "v-cycle signature" I was looking for, so I > think for now all my problems are resolved for now. Thank you very much, > Barry and Mark! > > Best, > Paul Gro?e-Bley > > > > On Wednesday, February 22, 2023 21:03 CET, Barry Smith > wrote: > > > > > > > On Feb 22, 2023, at 2:56 PM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi Barry, > > I think most of my "weird" observations came from the fact that I looked > at iterations of KSPSolve where the residual was already converged. PCMG > and PCGAMG do one V-cycle before even taking a look at the residual and > then independent of pc_mg_multiplicative_cycles stop if it is converged. > > Looking at iterations that are not converged with PCMG, > pc_mg_multiplicative_cycles works fine. > > At these iterations I also see the multiple calls to PCApply in a single > KSPSolve iteration which were throwing me off with PCAMGX before. > > The reason for these multiple applications of the preconditioner (tested > for both PCMG and PCAMGX) is that I had set maxits to 1 instead of 0. This > could be better documented, I think. > > > I do not understand what you are talking about with regard to maxits of > 1 instead of 0. For KSP maxits of 1 means one iteration, 0 is kind of > meaningless. > > The reason that there is a PCApply at the start of the solve is because > by default the KSPType is KSPGMRES which by default uses left > preconditioner which means the right hand side needs to be scaled by the > preconditioner before the KSP process starts. So in this configuration one > KSP iteration results in 2 PCApply. You can use -ksp_pc_side right to use > right preconditioning and then the number of PCApply will match the number > of KSP iterations. > > > Best, > Paul Gro?e-Bley > > > > On Wednesday, February 22, 2023 20:15 CET, Barry Smith > wrote: > > > > > > > On Feb 22, 2023, at 1:10 PM, Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > > Hi Mark, > > I use Nvidia Nsight Systems with --trace > cuda,nvtx,osrt,cublas-verbose,cusparse-verbose together with the NVTX > markers that come with -log_view. I.e. I get a nice view of all cuBLAS and > cuSPARSE calls (in addition to the actual kernels which are not always easy > to attribute). For PCMG and PCGAMG I also use -pc_mg_log for even more > detailed NVTX markers. > > The "signature" of a V-cycle in PCMG, PCGAMG and PCAMGX is pretty clear > because kernel runtimes on coarser levels are much shorter. At the coarsest > level, there normally isn't even enough work for the GPU (Nvidia A100) to > be fully occupied which is also visible in Nsight Systems. > > > Hmm, I run an example with -pc_mg_multiplicative_cycles 2 and most > definitely it changes the run. I am not understanding why it would not work > for you. If you use and don't use the option are the exact same counts > listed for all the events in the -log_view ? > > > I run only a single MPI rank with a single GPU, so profiling is > straighforward. > > Best, > Paul Gro?e-Bley > > On Wednesday, February 22, 2023 18:24 CET, Mark Adams > wrote: > > > > > On Wed, Feb 22, 2023 at 11:15 AM Paul Grosse-Bley < > paul.grosse-bley at ziti.uni-heidelberg.de> wrote: > >> Hi Barry, >> >> after using VecCUDAGetArray to initialize the RHS, that kernel still gets >> called as part of KSPSolve instead of KSPSetup, but its runtime is way less >> significant than the cudaMemcpy before, so I guess I will leave it like >> this. Other than that I kept the code like in my first message in this >> thread (as you wrote, benchmark_ksp.c is not well suited for PCMG). >> >> The profiling results for PCMG and PCAMG look as I would expect them to >> look, i.e. one can nicely see the GPU load/kernel runtimes going down and >> up again for each V-cycle. >> >> I was wondering about -pc_mg_multiplicative_cycles as it does not seem to >> make any difference. I would have expected to be able to increase the >> number of V-cycles per KSP iteration, but I keep seeing just a single >> V-cycle when changing the option (using PCMG). > > > How are you seeing this? > You might try -log_trace to see if you get two V cycles. > > >> >> When using BoomerAMG from PCHYPRE, calling KSPSetComputeInitialGuess >> between bench iterations to reset the solution vector does not seem to work >> as the residual keeps shrinking. Is this a bug? Any advice for working >> around this? >> > > > Looking at the doc > https://petsc.org/release/docs/manualpages/KSP/KSPSetComputeInitialGuess/ > you use this with KSPSetComputeRHS. > > In src/snes/tests/ex13.c I just zero out the solution vector. > > >> The profile for BoomerAMG also doesn't really show the V-cycle behavior >> of the other implementations. Most of the runtime seems to go into calls to >> cusparseDcsrsv which might happen at the different levels, but the runtime >> of these kernels doesn't show the V-cycle pattern. According to the output >> with -pc_hypre_boomeramg_print_statistics it is doing the right thing >> though, so I guess it is alright (and if not, this is probably the wrong >> place to discuss it). > > >> When using PCAMGX, I see two PCApply (each showing a nice V-cycle >> behavior) calls in KSPSolve (three for the very first KSPSolve) while >> expecting just one. Each KSPSolve should do a single preconditioned >> Richardson iteration. Why is the preconditioner applied multiple times here? >> > > > Again, not sure what "see" is, but PCAMGX is pretty new and has not been > used much. > Note some KSP methods apply to the PC before the iteration. > > Mark > > >> Thank you, >> Paul Gro?e-Bley >> >> >> On Monday, February 06, 2023 20:05 CET, Barry Smith >> wrote: >> >> >> >> >> >> It should not crash, take a look at the test cases at the bottom of the >> file. You are likely correct if the code, unfortunately, does use >> DMCreateMatrix() it will not work out of the box for geometric multigrid. >> So it might be the wrong example for you. >> >> I don't know what you mean about clever. If you simply set the solution >> to zero at the beginning of the loop then it will just do the same solve >> multiple times. The setup should not do much of anything after the first >> solver. Thought usually solves are big enough that one need not run solves >> multiple times to get a good understanding of their performance. >> >> >> >> >> >> >> >> On Feb 6, 2023, at 12:44 PM, Paul Grosse-Bley < >> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >> >> Hi Barry, >> >> src/ksp/ksp/tutorials/bench_kspsolve.c is certainly the better starting >> point, thank you! Sadly I get a segfault when executing that example with >> PCMG and more than one level, i.e. with the minimal args: >> >> $ mpiexec -c 1 ./bench_kspsolve -split_ksp -pc_type mg -pc_mg_levels 2 >> =========================================== >> Test: KSP performance - Poisson >> Input matrix: 27-pt finite difference stencil >> -n 100 >> DoFs = 1000000 >> Number of nonzeros = 26463592 >> >> Step1 - creating Vecs and Mat... >> Step2a - running PCSetUp()... >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and >> https://petsc.org/release/faq/ >> [0]PETSC ERROR: or try >> https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA >> systems to find memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is >> causing the crash. >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >> with errorcode 59. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> >> As the matrix is not created using DMDACreate3d I expected it to fail due >> to the missing geometric information, but I expected it to fail more >> gracefully than with a segfault. >> I will try to combine bench_kspsolve.c with ex45.c to get easy MG >> preconditioning, especially since I am interested in the 7pt stencil for >> now. >> >> Concerning my benchmarking loop from before: Is it generally discouraged >> to do this for KSPSolve due to PETSc cleverly/lazily skipping some of the >> work when doing the same solve multiple times or are the solves not >> iterated in bench_kspsolve.c (while the MatMuls are with -matmult) just to >> keep the runtime short? >> >> Thanks, >> Paul >> >> On Monday, February 06, 2023 17:04 CET, Barry Smith >> wrote: >> >> >> >> >> >> Paul, >> >> I think src/ksp/ksp/tutorials/benchmark_ksp.c is the code intended to >> be used for simple benchmarking. >> >> You can use VecCudaGetArray() to access the GPU memory of the vector >> and then call a CUDA kernel to compute the right hand side vector directly >> on the GPU. >> >> Barry >> >> >> >> On Feb 6, 2023, at 10:57 AM, Paul Grosse-Bley < >> paul.grosse-bley at ziti.uni-heidelberg.de> wrote: >> >> Hi, >> >> I want to compare different implementations of multigrid solvers for >> Nvidia GPUs using the poisson problem (starting from ksp tutorial example >> ex45.c). >> Therefore I am trying to get runtime results comparable to hpgmg-cuda >> >> (finite-volume), i.e. using multiple warmup and measurement solves and >> avoiding measuring setup time. >> For now I am using -log_view with added stages: >> >> PetscLogStageRegister("Solve Bench", &solve_bench_stage); >> for (int i = 0; i < BENCH_SOLVES; i++) { >> PetscCall(KSPSetComputeInitialGuess(ksp, ComputeInitialGuess, NULL)); >> // reset x >> PetscCall(KSPSetUp(ksp)); // try to avoid setup overhead during solve >> PetscCall(PetscDeviceContextSynchronize(dctx)); // make sure that >> everything is done >> >> PetscLogStagePush(solve_bench_stage); >> PetscCall(KSPSolve(ksp, NULL, NULL)); >> PetscLogStagePop(); >> } >> >> This snippet is preceded by a similar loop for warmup. >> >> When profiling this using Nsight Systems, I see that the very first solve >> is much slower which mostly correspods to H2D (host to device) copies and >> e.g. cuBLAS setup (maybe also paging overheads as mentioned in the docs >> , >> but probably insignificant in this case). The following solves have some >> overhead at the start from a H2D copy of a vector (the RHS I guess, as the >> copy is preceeded by a matrix-vector product) in the first MatResidual call >> (callchain: >> KSPSolve->MatResidual->VecAYPX->VecCUDACopyTo->cudaMemcpyAsync). My >> interpretation of the profiling results (i.e. cuBLAS calls) is that that >> vector is overwritten with the residual in Daxpy and therefore has to be >> copied again for the next iteration. >> >> Is there an elegant way of avoiding that H2D copy? I have seen some >> examples on constructing matrices directly on the GPU, but nothing about >> vectors. Any further tips for benchmarking (vs profiling) PETSc solvers? At >> the moment I am using jacobi as smoother, but I would like to have a CUDA >> implementation of SOR instead. Is there a good way of achieving that, e.g. >> using PCHYPREs boomeramg with a single level and "SOR/Jacobi"-smoother as >> smoother in PCMG? Or is the overhead from constantly switching between >> PETSc and hypre too big? >> >> Thanks, >> Paul >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Wed Feb 22 17:11:00 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Wed, 22 Feb 2023 23:11:00 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> Message-ID: Hi Barry, Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Barry Smith Sent: Wednesday, February 15, 2023 2:01 PM To: Sajid Ali Syed Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out On Feb 9, 2023, at 6:24 PM, Matthew Knepley wrote: On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` 599 ?????@*/ 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) -> 601 ?????{ 602 ????? PetscMPIInt rank; 603 ????? 604 ????? PetscFunctionBegin; (lldb) frame info frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 (lldb) The trace seems to indicate some sort of infinite loop causing an overflow. Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to remove all error checking from the prints inside Validate. Thanks, Matt PS: I'm using a arm64 mac, so I don't have access to valgrind. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Wed Feb 22 17:18:28 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Wed, 22 Feb 2023 23:18:28 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> Message-ID: One thing to note in relation to the trace attached in the previous email is that there are no warnings until the 36th call to KSP_Solve. The first error (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve (part of what the application marks as turn 10 of the propagator). The crash finally occurs on the 43rd call to KSP_solve. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Sajid Ali Syed Sent: Wednesday, February 22, 2023 5:11 PM To: Barry Smith Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode Hi Barry, Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Barry Smith Sent: Wednesday, February 15, 2023 2:01 PM To: Sajid Ali Syed Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out On Feb 9, 2023, at 6:24 PM, Matthew Knepley wrote: On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` 599 ?????@*/ 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) -> 601 ?????{ 602 ????? PetscMPIInt rank; 603 ????? 604 ????? PetscFunctionBegin; (lldb) frame info frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 (lldb) The trace seems to indicate some sort of infinite loop causing an overflow. Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to remove all error checking from the prints inside Validate. Thanks, Matt PS: I'm using a arm64 mac, so I don't have access to valgrind. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 22 17:23:23 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Feb 2023 18:23:23 -0500 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> Message-ID: On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users < petsc-users at mcs.anl.gov> wrote: > One thing to note in relation to the trace attached in the previous email > is that there are no warnings until the 36th call to KSP_Solve. The first > error (as indicated by ASAN) occurs somewhere before the 40th call to > KSP_Solve (part of what the application marks as turn 10 of the > propagator). The crash finally occurs on the 43rd call to KSP_solve. > Looking at the trace, it appears that stack handling is messed up and eventually it causes the crash. This can happen when PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try running this with -checkstack Thanks, Matt > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > ------------------------------ > *From:* Sajid Ali Syed > *Sent:* Wednesday, February 22, 2023 5:11 PM > *To:* Barry Smith > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] KSP_Solve crashes in debug mode > > Hi Barry, > > Thanks a lot for fixing this issue. I ran the same problem on a linux > machine and have the following trace for the same crash (with ASAN turned > on for both PETSc (on the latest commit of the branch) and the application) > : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 > > The trace seems to indicate a couple of buffer overflows, one of which > causes the crash. I'm not sure as to what causes them. > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > ------------------------------ > *From:* Barry Smith > *Sent:* Wednesday, February 15, 2023 2:01 PM > *To:* Sajid Ali Syed > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] KSP_Solve crashes in debug mode > > > https://gitlab.com/petsc/petsc/-/merge_requests/6075 > should > fix the possible recursive error condition Matt pointed out > > > On Feb 9, 2023, at 6:24 PM, Matthew Knepley wrote: > > On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace > from lldb is in the attached file. The crash now seems to be at: > > Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) > frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 > 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` > 599 ?????@*/ > 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) > -> 601 ?????{ > 602 ????? PetscMPIInt rank; > 603 > 604 ????? PetscFunctionBegin; > (lldb) frame info > frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 > (lldb) > > The trace seems to indicate some sort of infinite loop causing an overflow. > > > Yes, I have also seen this. What happens is that we have a memory error. > The error is reported inside PetscMallocValidate() > using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls > PetscMallocValidate again, which fails. We need to > remove all error checking from the prints inside Validate. > > Thanks, > > Matt > > > PS: I'm using a arm64 mac, so I don't have access to valgrind. > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Wed Feb 22 17:32:06 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Wed, 22 Feb 2023 23:32:06 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> Message-ID: Hi Matt, Adding `-checkstack` does not prevent the crash, both on my laptop and on the cluster. What does prevent the crash (on my laptop at least) is changing `PETSCSTACKSIZE` from 64 to 256 here : https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Matthew Knepley Sent: Wednesday, February 22, 2023 5:23 PM To: Sajid Ali Syed Cc: Barry Smith ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users > wrote: One thing to note in relation to the trace attached in the previous email is that there are no warnings until the 36th call to KSP_Solve. The first error (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve (part of what the application marks as turn 10 of the propagator). The crash finally occurs on the 43rd call to KSP_solve. Looking at the trace, it appears that stack handling is messed up and eventually it causes the crash. This can happen when PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try running this with -checkstack Thanks, Matt Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Sajid Ali Syed > Sent: Wednesday, February 22, 2023 5:11 PM To: Barry Smith > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode Hi Barry, Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Barry Smith > Sent: Wednesday, February 15, 2023 2:01 PM To: Sajid Ali Syed > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out On Feb 9, 2023, at 6:24 PM, Matthew Knepley > wrote: On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` 599 ?????@*/ 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) -> 601 ?????{ 602 ????? PetscMPIInt rank; 603 ????? 604 ????? PetscFunctionBegin; (lldb) frame info frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 (lldb) The trace seems to indicate some sort of infinite loop causing an overflow. Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to remove all error checking from the prints inside Validate. Thanks, Matt PS: I'm using a arm64 mac, so I don't have access to valgrind. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 22 18:28:58 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Feb 2023 19:28:58 -0500 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> Message-ID: On Wed, Feb 22, 2023 at 6:32 PM Sajid Ali Syed wrote: > Hi Matt, > > Adding `-checkstack` does not prevent the crash, both on my laptop and on > the cluster. > It will not prevent a crash. The output is intended to show us where the stack problem originates. Can you send the output? Thanks, Matt > What does prevent the crash (on my laptop at least) is changing > `PETSCSTACKSIZE` from 64 to 256 here : > https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 > > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > ------------------------------ > *From:* Matthew Knepley > *Sent:* Wednesday, February 22, 2023 5:23 PM > *To:* Sajid Ali Syed > *Cc:* Barry Smith ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] KSP_Solve crashes in debug mode > > On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > One thing to note in relation to the trace attached in the previous email > is that there are no warnings until the 36th call to KSP_Solve. The first > error (as indicated by ASAN) occurs somewhere before the 40th call to > KSP_Solve (part of what the application marks as turn 10 of the > propagator). The crash finally occurs on the 43rd call to KSP_solve. > > > Looking at the trace, it appears that stack handling is messed up and > eventually it causes the crash. This can happen when > PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try > running this with > > -checkstack > > Thanks, > > Matt > > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > > ------------------------------ > *From:* Sajid Ali Syed > *Sent:* Wednesday, February 22, 2023 5:11 PM > *To:* Barry Smith > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] KSP_Solve crashes in debug mode > > Hi Barry, > > Thanks a lot for fixing this issue. I ran the same problem on a linux > machine and have the following trace for the same crash (with ASAN turned > on for both PETSc (on the latest commit of the branch) and the application) > : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 > > > The trace seems to indicate a couple of buffer overflows, one of which > causes the crash. I'm not sure as to what causes them. > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > > ------------------------------ > *From:* Barry Smith > *Sent:* Wednesday, February 15, 2023 2:01 PM > *To:* Sajid Ali Syed > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] KSP_Solve crashes in debug mode > > > https://gitlab.com/petsc/petsc/-/merge_requests/6075 > should > fix the possible recursive error condition Matt pointed out > > > On Feb 9, 2023, at 6:24 PM, Matthew Knepley wrote: > > On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace > from lldb is in the attached file. The crash now seems to be at: > > Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) > frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 > 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` > 599 ?????@*/ > 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) > -> 601 ?????{ > 602 ????? PetscMPIInt rank; > 603 > 604 ????? PetscFunctionBegin; > (lldb) frame info > frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 > (lldb) > > The trace seems to indicate some sort of infinite loop causing an overflow. > > > Yes, I have also seen this. What happens is that we have a memory error. > The error is reported inside PetscMallocValidate() > using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls > PetscMallocValidate again, which fails. We need to > remove all error checking from the prints inside Validate. > > Thanks, > > Matt > > > PS: I'm using a arm64 mac, so I don't have access to valgrind. > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Wed Feb 22 18:34:00 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Thu, 23 Feb 2023 00:34:00 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> Message-ID: Hi Matt, This is a trace from the same crash, but with `-checkstack` included in .petscrc? : https://gist.github.com/s-sajid-ali/455b3982d47a31bff9e7ee211dd43991 I don't see any additional information regarding the possible cause. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Matthew Knepley Sent: Wednesday, February 22, 2023 6:28 PM To: Sajid Ali Syed Cc: Barry Smith ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode On Wed, Feb 22, 2023 at 6:32 PM Sajid Ali Syed > wrote: Hi Matt, Adding `-checkstack` does not prevent the crash, both on my laptop and on the cluster. It will not prevent a crash. The output is intended to show us where the stack problem originates. Can you send the output? Thanks, Matt What does prevent the crash (on my laptop at least) is changing `PETSCSTACKSIZE` from 64 to 256 here : https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Matthew Knepley > Sent: Wednesday, February 22, 2023 5:23 PM To: Sajid Ali Syed > Cc: Barry Smith >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users > wrote: One thing to note in relation to the trace attached in the previous email is that there are no warnings until the 36th call to KSP_Solve. The first error (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve (part of what the application marks as turn 10 of the propagator). The crash finally occurs on the 43rd call to KSP_solve. Looking at the trace, it appears that stack handling is messed up and eventually it causes the crash. This can happen when PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try running this with -checkstack Thanks, Matt Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Sajid Ali Syed > Sent: Wednesday, February 22, 2023 5:11 PM To: Barry Smith > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode Hi Barry, Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Barry Smith > Sent: Wednesday, February 15, 2023 2:01 PM To: Sajid Ali Syed > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out On Feb 9, 2023, at 6:24 PM, Matthew Knepley > wrote: On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` 599 ?????@*/ 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) -> 601 ?????{ 602 ????? PetscMPIInt rank; 603 ????? 604 ????? PetscFunctionBegin; (lldb) frame info frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 (lldb) The trace seems to indicate some sort of infinite loop causing an overflow. Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to remove all error checking from the prints inside Validate. Thanks, Matt PS: I'm using a arm64 mac, so I don't have access to valgrind. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Wed Feb 22 18:46:37 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Thu, 23 Feb 2023 00:46:37 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> Message-ID: Via a checkpoint in `PetscOptionsCheckInitial_Private`, I can confirm that `checkstack` is set to `PETSC_TRUE` and this leads to no (additional) information about erroneous stack handling. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Sajid Ali Syed Sent: Wednesday, February 22, 2023 6:34 PM To: Matthew Knepley Cc: Barry Smith ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode Hi Matt, This is a trace from the same crash, but with `-checkstack` included in .petscrc? : https://gist.github.com/s-sajid-ali/455b3982d47a31bff9e7ee211dd43991 I don't see any additional information regarding the possible cause. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Matthew Knepley Sent: Wednesday, February 22, 2023 6:28 PM To: Sajid Ali Syed Cc: Barry Smith ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode On Wed, Feb 22, 2023 at 6:32 PM Sajid Ali Syed > wrote: Hi Matt, Adding `-checkstack` does not prevent the crash, both on my laptop and on the cluster. It will not prevent a crash. The output is intended to show us where the stack problem originates. Can you send the output? Thanks, Matt What does prevent the crash (on my laptop at least) is changing `PETSCSTACKSIZE` from 64 to 256 here : https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Matthew Knepley > Sent: Wednesday, February 22, 2023 5:23 PM To: Sajid Ali Syed > Cc: Barry Smith >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users > wrote: One thing to note in relation to the trace attached in the previous email is that there are no warnings until the 36th call to KSP_Solve. The first error (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve (part of what the application marks as turn 10 of the propagator). The crash finally occurs on the 43rd call to KSP_solve. Looking at the trace, it appears that stack handling is messed up and eventually it causes the crash. This can happen when PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try running this with -checkstack Thanks, Matt Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Sajid Ali Syed > Sent: Wednesday, February 22, 2023 5:11 PM To: Barry Smith > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode Hi Barry, Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Barry Smith > Sent: Wednesday, February 15, 2023 2:01 PM To: Sajid Ali Syed > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out On Feb 9, 2023, at 6:24 PM, Matthew Knepley > wrote: On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` 599 ?????@*/ 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) -> 601 ?????{ 602 ????? PetscMPIInt rank; 603 ????? 604 ????? PetscFunctionBegin; (lldb) frame info frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 (lldb) The trace seems to indicate some sort of infinite loop causing an overflow. Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to remove all error checking from the prints inside Validate. Thanks, Matt PS: I'm using a arm64 mac, so I don't have access to valgrind. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 22 18:49:20 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 22 Feb 2023 19:49:20 -0500 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> Message-ID: <44829EB5-DA49-4615-931A-FBA04A1EC163@petsc.dev> Hmm, there could be a bug in our handling of the stack when reaches the maximum. It is suppose to just stop collecting additional levels at that point but likely it has not been tested since a lot of refactorizations. What are you doing to have so many stack frames? > On Feb 22, 2023, at 6:32 PM, Sajid Ali Syed wrote: > > Hi Matt, > > Adding `-checkstack` does not prevent the crash, both on my laptop and on the cluster. > > What does prevent the crash (on my laptop at least) is changing `PETSCSTACKSIZE` from 64 to 256 here : https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 > > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > From: Matthew Knepley > > Sent: Wednesday, February 22, 2023 5:23 PM > To: Sajid Ali Syed > > Cc: Barry Smith >; petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode > > On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users > wrote: > One thing to note in relation to the trace attached in the previous email is that there are no warnings until the 36th call to KSP_Solve. The first error (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve (part of what the application marks as turn 10 of the propagator). The crash finally occurs on the 43rd call to KSP_solve. > > Looking at the trace, it appears that stack handling is messed up and eventually it causes the crash. This can happen when > PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try running this with > > -checkstack > > Thanks, > > Matt > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > From: Sajid Ali Syed > > Sent: Wednesday, February 22, 2023 5:11 PM > To: Barry Smith > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode > > Hi Barry, > > Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 > > The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > From: Barry Smith > > Sent: Wednesday, February 15, 2023 2:01 PM > To: Sajid Ali Syed > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode > > > https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out > > >> On Feb 9, 2023, at 6:24 PM, Matthew Knepley > wrote: >> >> On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: >> I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: >> >> Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) >> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 >> 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` >> 599 ?????@*/ >> 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) >> -> 601 ?????{ >> 602 ????? PetscMPIInt rank; >> 603 ????? >> 604 ????? PetscFunctionBegin; >> (lldb) frame info >> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 >> (lldb) >> The trace seems to indicate some sort of infinite loop causing an overflow. >> >> >> Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() >> using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to >> remove all error checking from the prints inside Validate. >> >> Thanks, >> >> Matt >> >> PS: I'm using a arm64 mac, so I don't have access to valgrind. >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Scientific Computing Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From salmanuom206 at gmail.com Thu Feb 23 01:54:23 2023 From: salmanuom206 at gmail.com (Salman Ahmad) Date: Thu, 23 Feb 2023 15:54:23 +0800 Subject: [petsc-users] How to link SuperLU_MT library in GCC_default.mk file? Message-ID: Dear All, I compiled the library "SuperLU_MT" and got the "libsuperlu_mt_PTHREAD.a" and "libsuperlu_mt_PTHREAD.a" to /usr/lib to /usr/lib. I am using the following file to link but not working: CC = gcc CXX = g++ F77 = gfortran LINKER = ${CXX} WARNINGS = -Wall -pedantic -Wextra -Weffc++ -Woverloaded-virtual -Wfloat-equal -Wshadow \ -Wredundant-decls -Winline -fmax-errors=1 CXXFLAGS += -ffast-math -O3 -march=native -std=c++17 ${WARNINGS} LINKFLAGS += -O2 #architecture #CPU = -march=znver2 CXXFLAGS += ${CPU} LINKFLAGS += ${CPU} ifeq ($(UBUNTU),1) LINKFLAGS += -llapack -lblas CXXFLAGS += -DUBUNTU else # on archlinux LINKFLAGS += -llapack -lopenblas -lcblas endif SANITARY = -fsanitize=address -fsanitize=undefined -fsanitize=null -fsanitize=return \ -fsanitize=bounds -fsanitize=alignment -fsanitize=float-divide-by-zero -fsanitize=float-cast-overflow \ -fsanitize=bool -fsanitize=enum -fsanitize=vptr # SuperLU_MT CXXFLAGS += -L/usr/lib/lsuperlu_mt_PTHREAD LINKFLAGS += -L/usr/lib/lsuperlu_mt_PTHREAD # SUPERLU_INC= -I/usr/include/superlu -I/usr/include/superlu-dist CXXFLAGS += ${SUPERLU_INC} LINKFLAGS +=-lsuperlu # OpenMP CXXFLAGS += -fopenmp LINKFLAGS += -fopenmp Any suggestions? Best Regards, Salman Ahmad -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Feb 23 12:39:38 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 23 Feb 2023 13:39:38 -0500 Subject: [petsc-users] How to link SuperLU_MT library in GCC_default.mk file? In-Reply-To: References: Message-ID: You want to configure PETSc with threads and not add them yourself: '--with-openmp=1', '--with-log=0', '--with-threadsafety', And you want a threadsafe lapack probably. Mark On Thu, Feb 23, 2023 at 2:54 AM Salman Ahmad wrote: > Dear All, > > I compiled the library "SuperLU_MT" and got the "libsuperlu_mt_PTHREAD.a" > and "libsuperlu_mt_PTHREAD.a" to /usr/lib to /usr/lib. I am using the > following file to link but not working: > > CC = gcc > CXX = g++ > F77 = gfortran > LINKER = ${CXX} > > WARNINGS = -Wall -pedantic -Wextra -Weffc++ -Woverloaded-virtual > -Wfloat-equal -Wshadow \ > -Wredundant-decls -Winline -fmax-errors=1 > > CXXFLAGS += -ffast-math -O3 -march=native -std=c++17 ${WARNINGS} > > LINKFLAGS += -O2 > > #architecture > #CPU = -march=znver2 > CXXFLAGS += ${CPU} > LINKFLAGS += ${CPU} > > > ifeq ($(UBUNTU),1) > LINKFLAGS += -llapack -lblas > CXXFLAGS += -DUBUNTU > else > # on archlinux > LINKFLAGS += -llapack -lopenblas -lcblas > endif > > SANITARY = -fsanitize=address -fsanitize=undefined -fsanitize=null > -fsanitize=return \ > -fsanitize=bounds -fsanitize=alignment -fsanitize=float-divide-by-zero > -fsanitize=float-cast-overflow \ > -fsanitize=bool -fsanitize=enum -fsanitize=vptr > > # SuperLU_MT > CXXFLAGS += -L/usr/lib/lsuperlu_mt_PTHREAD > LINKFLAGS += -L/usr/lib/lsuperlu_mt_PTHREAD > # > SUPERLU_INC= -I/usr/include/superlu -I/usr/include/superlu-dist > CXXFLAGS += ${SUPERLU_INC} > LINKFLAGS +=-lsuperlu > # OpenMP > CXXFLAGS += -fopenmp > LINKFLAGS += -fopenmp > > Any suggestions? > Best Regards, > Salman Ahmad > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Feb 23 20:42:36 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 23 Feb 2023 21:42:36 -0500 Subject: [petsc-users] How to link SuperLU_MT library in GCC_default.mk file? In-Reply-To: References: Message-ID: Keep this on the list. We don't support SuperLU_MT. There is no interface to it in PETSc. Here is a table with our available solvers: https://petsc.org/main/overview/linear_solve_table I thought SuperLU supported using threads in the solver, but I could be wrong. You could ask them. They would know what is supported in PETSc and how to use it. Good luck, Mark On Thu, Feb 23, 2023 at 8:44 PM Salman Ahmad wrote: > Dear Mark, > > Thank you so much for your reply! > > I am a Mathematics Student and a beginner in Makefile. Please help me how > I add the (libsuperlu_mt_PTHREAD.a) having address /usr/lib in the > attached make file? > > Best Regards, > Salman Ahmad > > On Fri, Feb 24, 2023 at 2:39 AM Mark Adams wrote: > >> You want to configure PETSc with threads and not add them yourself: >> >> '--with-openmp=1', >> '--with-log=0', >> '--with-threadsafety', >> >> And you want a threadsafe lapack probably. >> >> Mark >> >> On Thu, Feb 23, 2023 at 2:54 AM Salman Ahmad >> wrote: >> >>> Dear All, >>> >>> I compiled the library "SuperLU_MT" and got the "libsuperlu_mt_PTHREAD.a" >>> and "libsuperlu_mt_PTHREAD.a" to /usr/lib to /usr/lib. I am using >>> the following file to link but not working: >>> >>> CC = gcc >>> CXX = g++ >>> F77 = gfortran >>> LINKER = ${CXX} >>> >>> WARNINGS = -Wall -pedantic -Wextra -Weffc++ -Woverloaded-virtual >>> -Wfloat-equal -Wshadow \ >>> -Wredundant-decls -Winline -fmax-errors=1 >>> >>> CXXFLAGS += -ffast-math -O3 -march=native -std=c++17 ${WARNINGS} >>> >>> LINKFLAGS += -O2 >>> >>> #architecture >>> #CPU = -march=znver2 >>> CXXFLAGS += ${CPU} >>> LINKFLAGS += ${CPU} >>> >>> >>> ifeq ($(UBUNTU),1) >>> LINKFLAGS += -llapack -lblas >>> CXXFLAGS += -DUBUNTU >>> else >>> # on archlinux >>> LINKFLAGS += -llapack -lopenblas -lcblas >>> endif >>> >>> SANITARY = -fsanitize=address -fsanitize=undefined -fsanitize=null >>> -fsanitize=return \ >>> -fsanitize=bounds -fsanitize=alignment -fsanitize=float-divide-by-zero >>> -fsanitize=float-cast-overflow \ >>> -fsanitize=bool -fsanitize=enum -fsanitize=vptr >>> >>> # SuperLU_MT >>> CXXFLAGS += -L/usr/lib/lsuperlu_mt_PTHREAD >>> LINKFLAGS += -L/usr/lib/lsuperlu_mt_PTHREAD >>> # >>> SUPERLU_INC= -I/usr/include/superlu -I/usr/include/superlu-dist >>> CXXFLAGS += ${SUPERLU_INC} >>> LINKFLAGS +=-lsuperlu >>> # OpenMP >>> CXXFLAGS += -fopenmp >>> LINKFLAGS += -fopenmp >>> >>> Any suggestions? >>> Best Regards, >>> Salman Ahmad >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sasyed at fnal.gov Fri Feb 24 11:39:01 2023 From: sasyed at fnal.gov (Sajid Ali Syed) Date: Fri, 24 Feb 2023 17:39:01 +0000 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: <44829EB5-DA49-4615-931A-FBA04A1EC163@petsc.dev> References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> <44829EB5-DA49-4615-931A-FBA04A1EC163@petsc.dev> Message-ID: Hi Barry, The application calls PetscCallAbort in a loop, i.e. for i in range: void routine(PetscCallAbort(function_returning_petsc_error_code)) From the prior logs it looks like the stack grows every time PetscCallAbort is called (in other words, the stack does not shrink upon successful exit from PetscCallAbort). Is this usage pattern not recommended? Should I be manually checking for success of the `function_returning_petsc_error_code` and throw instead of relying on PetscCallAbort? Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Barry Smith Sent: Wednesday, February 22, 2023 6:49 PM To: Sajid Ali Syed Cc: Matthew Knepley ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] KSP_Solve crashes in debug mode Hmm, there could be a bug in our handling of the stack when reaches the maximum. It is suppose to just stop collecting additional levels at that point but likely it has not been tested since a lot of refactorizations. What are you doing to have so many stack frames? On Feb 22, 2023, at 6:32 PM, Sajid Ali Syed wrote: Hi Matt, Adding `-checkstack` does not prevent the crash, both on my laptop and on the cluster. What does prevent the crash (on my laptop at least) is changing `PETSCSTACKSIZE` from 64 to 256 here : https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Matthew Knepley > Sent: Wednesday, February 22, 2023 5:23 PM To: Sajid Ali Syed > Cc: Barry Smith >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users > wrote: One thing to note in relation to the trace attached in the previous email is that there are no warnings until the 36th call to KSP_Solve. The first error (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve (part of what the application marks as turn 10 of the propagator). The crash finally occurs on the 43rd call to KSP_solve. Looking at the trace, it appears that stack handling is messed up and eventually it causes the crash. This can happen when PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try running this with -checkstack Thanks, Matt Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Sajid Ali Syed > Sent: Wednesday, February 22, 2023 5:11 PM To: Barry Smith > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode Hi Barry, Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io ________________________________ From: Barry Smith > Sent: Wednesday, February 15, 2023 2:01 PM To: Sajid Ali Syed > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out On Feb 9, 2023, at 6:24 PM, Matthew Knepley > wrote: On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` 599 ?????@*/ 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) -> 601 ?????{ 602 ????? PetscMPIInt rank; 603 ????? 604 ????? PetscFunctionBegin; (lldb) frame info frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 (lldb) The trace seems to indicate some sort of infinite loop causing an overflow. Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to remove all error checking from the prints inside Validate. Thanks, Matt PS: I'm using a arm64 mac, so I don't have access to valgrind. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Feb 24 11:47:19 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 24 Feb 2023 12:47:19 -0500 Subject: [petsc-users] KSP_Solve crashes in debug mode In-Reply-To: References: <95C98D90-F093-4C21-9CC2-AA23F729B5F0@petsc.dev> <901A3EF7-42A7-475E-BFC3-8DF5C9B8285E@petsc.dev> <44829EB5-DA49-4615-931A-FBA04A1EC163@petsc.dev> Message-ID: Hmm, here is the macro #define PetscCallAbort(comm, ...) \ do { \ PetscErrorCode ierr_petsc_call_abort_; \ PetscStackUpdateLine; \ ierr_petsc_call_abort_ = __VA_ARGS__; \ if (PetscUnlikely(ierr_petsc_call_abort_ != PETSC_SUCCESS)) { \ ierr_petsc_call_abort_ = PetscError(PETSC_COMM_SELF, __LINE__, PETSC_FUNCTION_NAME, __FILE__, ierr_petsc_call_abort_, PETSC_ERROR_REPEAT, " "); \ (void)MPI_Abort(comm, (PetscMPIInt)ierr_petsc_call_abort_); \ } \ } while (0) it does not seem to increment anything in the stack. So I think call should be ok Perhaps your function has a PetscFunctionBegin, but no PetscFunctionReturn() or in some other way increase the stack size without decreasing it? > On Feb 24, 2023, at 12:39 PM, Sajid Ali Syed wrote: > > Hi Barry, > > The application calls PetscCallAbort in a loop, i.e. > > for i in range: > void routine(PetscCallAbort(function_returning_petsc_error_code)) > > From the prior logs it looks like the stack grows every time PetscCallAbort is called (in other words, the stack does not shrink upon successful exit from PetscCallAbort). > > Is this usage pattern not recommended? Should I be manually checking for success of the `function_returning_petsc_error_code` and throw instead of relying on PetscCallAbort? > > > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > From: Barry Smith > > Sent: Wednesday, February 22, 2023 6:49 PM > To: Sajid Ali Syed > > Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode > > > Hmm, there could be a bug in our handling of the stack when reaches the maximum. It is suppose to just stop collecting additional levels at that point but likely it has not been tested since a lot of refactorizations. > > What are you doing to have so many stack frames? > >> On Feb 22, 2023, at 6:32 PM, Sajid Ali Syed > wrote: >> >> Hi Matt, >> >> Adding `-checkstack` does not prevent the crash, both on my laptop and on the cluster. >> >> What does prevent the crash (on my laptop at least) is changing `PETSCSTACKSIZE` from 64 to 256 here : https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 >> >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Data Science, Simulation, and Learning Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> From: Matthew Knepley > >> Sent: Wednesday, February 22, 2023 5:23 PM >> To: Sajid Ali Syed > >> Cc: Barry Smith >; petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode >> >> On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users > wrote: >> One thing to note in relation to the trace attached in the previous email is that there are no warnings until the 36th call to KSP_Solve. The first error (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve (part of what the application marks as turn 10 of the propagator). The crash finally occurs on the 43rd call to KSP_solve. >> >> Looking at the trace, it appears that stack handling is messed up and eventually it causes the crash. This can happen when >> PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try running this with >> >> -checkstack >> >> Thanks, >> >> Matt >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Data Science, Simulation, and Learning Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> From: Sajid Ali Syed > >> Sent: Wednesday, February 22, 2023 5:11 PM >> To: Barry Smith > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode >> >> Hi Barry, >> >> Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 >> >> The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Data Science, Simulation, and Learning Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> From: Barry Smith > >> Sent: Wednesday, February 15, 2023 2:01 PM >> To: Sajid Ali Syed > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode >> >> >> https://gitlab.com/petsc/petsc/-/merge_requests/6075 should fix the possible recursive error condition Matt pointed out >> >> >>> On Feb 9, 2023, at 6:24 PM, Matthew Knepley > wrote: >>> >>> On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users > wrote: >>> I added ?-malloc_debug? in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: >>> >>> Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) >>> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 >>> 598 ????? `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` >>> 599 ?????@*/ >>> 600 ?????PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) >>> -> 601 ?????{ >>> 602 ????? PetscMPIInt rank; >>> 603 ????? >>> 604 ????? PetscFunctionBegin; >>> (lldb) frame info >>> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 >>> (lldb) >>> The trace seems to indicate some sort of infinite loop causing an overflow. >>> >>> >>> Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() >>> using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to >>> remove all error checking from the prints inside Validate. >>> >>> Thanks, >>> >>> Matt >>> >>> PS: I'm using a arm64 mac, so I don't have access to valgrind. >>> >>> Thank You, >>> Sajid Ali (he/him) | Research Associate >>> Scientific Computing Division >>> Fermi National Accelerator Laboratory >>> s-sajid-ali.github.io >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Sat Feb 25 14:11:10 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sat, 25 Feb 2023 14:11:10 -0600 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: My apologies for the late follow-up. There was a time conflict. A simple example code related to the issue I mentioned is attached here. The sample code does: (1) load grid on dm, (2) compute vertex-wise control volume for each node in a median-dual way, (3) halo exchange among procs to have complete control volume values, and (4) print out its field as a .vtu file. To make sure, the computed control volume is also compared with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see lines 771-793). Back to the original problem, I can get a proper control volume field with PETSc 3.18.4, which is the latest stable release. However, if I use PETSc from the main repo, it gives a strange control volume field. Something is certainly strange around the parallel boundaries, thus I think something went wrong with halo communication. To help understand, a comparing snapshot is also attached. I guess a certain part of my code is no longer compatible with PETSc unless there is a bug in the library. Could I get comments on it? Thanks, Mike > On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley > wrote: > >> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell >> wrote: >> >>> As a follow-up, I tested: >>> >>> (1) Download tar for v3.18.4 from petsc gitlab ( >>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on DMPlex >>> halo exchange. This version works as I expect. >>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) >>> has issues with DMPlex halo exchange. Something is suspicious about this >>> main branch, related to DMPlex halo. The solution field I got is not >>> correct. But it works okay with 1-proc. >>> >>> Does anyone have any comments on this issue? I am curious if other >>> DMPlex users have no problem regarding halo exchange. FYI, I do not >>> declare ghost layers for halo exchange. >>> >> >> There should not have been any changes there and there are definitely >> tests for this. >> >> It would be great if you could send something that failed. I could fix it >> and add it as a test. >> > > Just to follow up, we have tests of the low-level communication (Plex > tests ex1, ex12, ex18, ex29, ex31), and then we have > tests that use halo exchange for PDE calculations, for example SNES > tutorial ex12, ex13, ex62. THe convergence rates > should be off if the halo exchange were wrong. Is there any example > similar to your code that is failing on your installation? > Or is there a way to run your code? > > Thanks, > > Matt > > >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Mike >>> >>> >>>> Dear PETSc team, >>>> >>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>> version of PETSc: >>>> >>git rev-parse origin >>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>> >>git rev-parse FETCH_HEAD >>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>> >>>> There has been no issue, before the one with VTK viewer, which Jed >>>> fixed today ( >>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>> ). >>>> >>>> Since that MR has been merged into the main repo, I pulled the latest >>>> version of PETSc (basically I cloned it from scratch). But if I use the >>>> same configure option with before, and run my code, then there is an issue >>>> with halo exchange. The code runs without error message, but it gives wrong >>>> solution field. I guess the issue I have is related to graph partitioner or >>>> halo exchange part. This is because if I run the code with 1-proc, the >>>> solution is correct. I only updated the version of PETSc and there was no >>>> change in my own code. Could I get any comments on the issue? I was >>>> wondering if there have been many changes in halo exchange or graph >>>> partitioning & distributing part related to DMPlex. >>>> >>>> Thanks, >>>> Mike >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Test_Version.tar Type: application/x-tar Size: 665600 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: output_example_snapshot.png Type: image/png Size: 249452 bytes Desc: not available URL: From jl7037 at mun.ca Sat Feb 25 16:44:40 2023 From: jl7037 at mun.ca (Long, Jianbo) Date: Sat, 25 Feb 2023 23:44:40 +0100 Subject: [petsc-users] petsc compiled without MPI Message-ID: Hello, For some of my applications, I need to use petsc without mpi, or use it sequentially. I wonder where I can find examples/tutorials for this ? Thanks very much, Jianbo Long -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Sun Feb 26 00:28:31 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Sun, 26 Feb 2023 07:28:31 +0100 Subject: [petsc-users] petsc compiled without MPI In-Reply-To: References: Message-ID: <89666060-9D66-4B4D-AE11-B95CCE97FEA9@joliv.et> > On 25 Feb 2023, at 11:44 PM, Long, Jianbo wrote: > > Hello, > > For some of my applications, I need to use petsc without mpi, or use it sequentially. I wonder where I can find examples/tutorials for this ? You can run sequentially with just a single MPI process (-n 1). If you need to run without MPI whatsoever, you?ll need to have a separate PETSc installation which was configured --with-mpi=0 In both cases, the same user-code will run, i.e., all PETSc examples available with the sources will work (though some are designed purely for parallel experiments and may error out early on purpose). Thanks, Pierre > Thanks very much, > Jianbo Long -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 26 08:01:41 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 26 Feb 2023 09:01:41 -0500 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: On Sat, Feb 25, 2023 at 3:11 PM Mike Michell wrote: > My apologies for the late follow-up. There was a time conflict. > > A simple example code related to the issue I mentioned is attached here. > The sample code does: (1) load grid on dm, (2) compute vertex-wise control > volume for each node in a median-dual way, (3) halo exchange among procs to > have complete control volume values, and (4) print out its field as a .vtu > file. To make sure, the computed control volume is also compared with > PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see lines > 771-793). > > Back to the original problem, I can get a proper control volume field with > PETSc 3.18.4, which is the latest stable release. However, if I use PETSc > from the main repo, it gives a strange control volume field. Something is > certainly strange around the parallel boundaries, thus I think something > went wrong with halo communication. To help understand, a comparing > snapshot is also attached. I guess a certain part of my code is no longer > compatible with PETSc unless there is a bug in the library. Could I get > comments on it? > I can run your example. The numbers I get for "median-dual volume" do not match the "PETSc volume", and the PETSc volume is correct. Moreover, the median-dual numbers change, which suggests a memory fault. I compiled it using address sanitizer, and it found an error: Number of physical boundary edge ... 4 0 Number of physical and parallel boundary edge ... 4 0 Number of parallel boundary edge ... 0 0 Number of physical boundary edge ... 4 1 Number of physical and parallel boundary edge ... 4 1 Number of parallel boundary edge ... 0 1 ================================================================= ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 READ of size 8 at 0x603000022d40 thread T0 ================================================================= ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 READ of size 8 at 0x60300000f0f0 thread T0 #0 0x10cf702a7 in MAIN__ test.F90:657 #1 0x10cf768ee in main test.F90:43 #0 0x1068e12a7 in MAIN__ test.F90:657 #1 0x1068e78ee in main test.F90:43 #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) 0x60300000f0f0 is located 0 bytes to the right of 32-byte region [0x60300000f0d0,0x60300000f0f0) allocated by thread T0 here: #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) 0x603000022d40 is located 0 bytes to the right of 32-byte region [0x603000022d20,0x603000022d40) allocated by thread T0 here: #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) #1 0x1068dba71 in MAIN__ test.F90:499 #2 0x1068e78ee in main test.F90:43 #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ Shadow bytes around the buggy address: which corresponds to ! midpoint of median-dual face for inner face axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell and these were allocated here allocate(xc(ncell)) allocate(yc(ncell)) Hopefully the error is straightforward to see now. Thanks, Matt > Thanks, > Mike > > >> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley >> wrote: >> >>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell >>> wrote: >>> >>>> As a follow-up, I tested: >>>> >>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on DMPlex >>>> halo exchange. This version works as I expect. >>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) >>>> has issues with DMPlex halo exchange. Something is suspicious about this >>>> main branch, related to DMPlex halo. The solution field I got is not >>>> correct. But it works okay with 1-proc. >>>> >>>> Does anyone have any comments on this issue? I am curious if other >>>> DMPlex users have no problem regarding halo exchange. FYI, I do not >>>> declare ghost layers for halo exchange. >>>> >>> >>> There should not have been any changes there and there are definitely >>> tests for this. >>> >>> It would be great if you could send something that failed. I could fix >>> it and add it as a test. >>> >> >> Just to follow up, we have tests of the low-level communication (Plex >> tests ex1, ex12, ex18, ex29, ex31), and then we have >> tests that use halo exchange for PDE calculations, for example SNES >> tutorial ex12, ex13, ex62. THe convergence rates >> should be off if the halo exchange were wrong. Is there any example >> similar to your code that is failing on your installation? >> Or is there a way to run your code? >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Mike >>>> >>>> >>>>> Dear PETSc team, >>>>> >>>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>>> version of PETSc: >>>>> >>git rev-parse origin >>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>> >>git rev-parse FETCH_HEAD >>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>> >>>>> There has been no issue, before the one with VTK viewer, which Jed >>>>> fixed today ( >>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>> ). >>>>> >>>>> Since that MR has been merged into the main repo, I pulled the latest >>>>> version of PETSc (basically I cloned it from scratch). But if I use the >>>>> same configure option with before, and run my code, then there is an issue >>>>> with halo exchange. The code runs without error message, but it gives wrong >>>>> solution field. I guess the issue I have is related to graph partitioner or >>>>> halo exchange part. This is because if I run the code with 1-proc, the >>>>> solution is correct. I only updated the version of PETSc and there was no >>>>> change in my own code. Could I get any comments on the issue? I was >>>>> wondering if there have been many changes in halo exchange or graph >>>>> partitioning & distributing part related to DMPlex. >>>>> >>>>> Thanks, >>>>> Mike >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sun Feb 26 09:38:59 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Sun, 26 Feb 2023 09:38:59 -0600 (CST) Subject: [petsc-users] petsc compiled without MPI In-Reply-To: <89666060-9D66-4B4D-AE11-B95CCE97FEA9@joliv.et> References: <89666060-9D66-4B4D-AE11-B95CCE97FEA9@joliv.et> Message-ID: On Sun, 26 Feb 2023, Pierre Jolivet wrote: > > > > On 25 Feb 2023, at 11:44 PM, Long, Jianbo wrote: > > > > Hello, > > > > For some of my applications, I need to use petsc without mpi, or use it sequentially. I wonder where I can find examples/tutorials for this ? > > You can run sequentially with just a single MPI process (-n 1). even if you build with mpich/openmpi - you can run sequentially without mpiexec - i.e: ./binary One reason to do this [instead of building PETSc with --with-mpi=0] - is if you are mixing in multiple pkgs that have MPI dependencies [in which case - its best to build all these pkgs with the same mpich or openmpi - but still run sequentially]. Satish > If you need to run without MPI whatsoever, you?ll need to have a separate PETSc installation which was configured --with-mpi=0 > In both cases, the same user-code will run, i.e., all PETSc examples available with the sources will work (though some are designed purely for parallel experiments and may error out early on purpose). > > Thanks, > Pierre > > > Thanks very much, > > Jianbo Long > > From mi.mike1021 at gmail.com Sun Feb 26 10:19:00 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sun, 26 Feb 2023 10:19:00 -0600 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: Which version of petsc you tested? With petsc 3.18.4, median duan volume gives the same value with petsc from DMPlexComputeCellGeometryFVM(). > On Sat, Feb 25, 2023 at 3:11 PM Mike Michell > wrote: > >> My apologies for the late follow-up. There was a time conflict. >> >> A simple example code related to the issue I mentioned is attached here. >> The sample code does: (1) load grid on dm, (2) compute vertex-wise control >> volume for each node in a median-dual way, (3) halo exchange among procs to >> have complete control volume values, and (4) print out its field as a .vtu >> file. To make sure, the computed control volume is also compared with >> PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see lines >> 771-793). >> >> Back to the original problem, I can get a proper control volume field >> with PETSc 3.18.4, which is the latest stable release. However, if I use >> PETSc from the main repo, it gives a strange control volume field. >> Something is certainly strange around the parallel boundaries, thus I think >> something went wrong with halo communication. To help understand, a >> comparing snapshot is also attached. I guess a certain part of my code is >> no longer compatible with PETSc unless there is a bug in the library. Could >> I get comments on it? >> > > I can run your example. The numbers I get for "median-dual volume" do not > match the "PETSc volume", and the PETSc volume is correct. Moreover, the > median-dual numbers change, which suggests a memory fault. I compiled it > using address sanitizer, and it found an error: > > Number of physical boundary edge ... 4 0 > Number of physical and parallel boundary edge ... 4 > 0 > Number of parallel boundary edge ... 0 0 > Number of physical boundary edge ... 4 1 > Number of physical and parallel boundary edge ... 4 > 1 > Number of parallel boundary edge ... 0 1 > ================================================================= > ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address > 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 > READ of size 8 at 0x603000022d40 thread T0 > ================================================================= > ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address > 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 > READ of size 8 at 0x60300000f0f0 thread T0 > #0 0x10cf702a7 in MAIN__ test.F90:657 > #1 0x10cf768ee in main test.F90:43 > #0 0x1068e12a7 in MAIN__ test.F90:657 > #1 0x1068e78ee in main test.F90:43 > #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) > > 0x60300000f0f0 is located 0 bytes to the right of 32-byte region > [0x60300000f0d0,0x60300000f0f0) > allocated by thread T0 here: > #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) > > 0x603000022d40 is located 0 bytes to the right of 32-byte region > [0x603000022d20,0x603000022d40) > allocated by thread T0 here: > #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) > #1 0x1068dba71 in MAIN__ test.F90:499 > #2 0x1068e78ee in main test.F90:43 > #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) > > SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ > Shadow bytes around the buggy address: > > which corresponds to > > ! midpoint of median-dual face for inner face > axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell > axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell > > and these were allocated here > > allocate(xc(ncell)) > allocate(yc(ncell)) > > Hopefully the error is straightforward to see now. > > Thanks, > > Matt > > >> Thanks, >> Mike >> >> >>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley >>> wrote: >>> >>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell >>>> wrote: >>>> >>>>> As a follow-up, I tested: >>>>> >>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on DMPlex >>>>> halo exchange. This version works as I expect. >>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) >>>>> has issues with DMPlex halo exchange. Something is suspicious about this >>>>> main branch, related to DMPlex halo. The solution field I got is not >>>>> correct. But it works okay with 1-proc. >>>>> >>>>> Does anyone have any comments on this issue? I am curious if other >>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>> declare ghost layers for halo exchange. >>>>> >>>> >>>> There should not have been any changes there and there are definitely >>>> tests for this. >>>> >>>> It would be great if you could send something that failed. I could fix >>>> it and add it as a test. >>>> >>> >>> Just to follow up, we have tests of the low-level communication (Plex >>> tests ex1, ex12, ex18, ex29, ex31), and then we have >>> tests that use halo exchange for PDE calculations, for example SNES >>> tutorial ex12, ex13, ex62. THe convergence rates >>> should be off if the halo exchange were wrong. Is there any example >>> similar to your code that is failing on your installation? >>> Or is there a way to run your code? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Mike >>>>> >>>>> >>>>>> Dear PETSc team, >>>>>> >>>>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>>>> version of PETSc: >>>>>> >>git rev-parse origin >>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>> >>git rev-parse FETCH_HEAD >>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>> >>>>>> There has been no issue, before the one with VTK viewer, which Jed >>>>>> fixed today ( >>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>> ). >>>>>> >>>>>> Since that MR has been merged into the main repo, I pulled the latest >>>>>> version of PETSc (basically I cloned it from scratch). But if I use the >>>>>> same configure option with before, and run my code, then there is an issue >>>>>> with halo exchange. The code runs without error message, but it gives wrong >>>>>> solution field. I guess the issue I have is related to graph partitioner or >>>>>> halo exchange part. This is because if I run the code with 1-proc, the >>>>>> solution is correct. I only updated the version of PETSc and there was no >>>>>> change in my own code. Could I get any comments on the issue? I was >>>>>> wondering if there have been many changes in halo exchange or graph >>>>>> partitioning & distributing part related to DMPlex. >>>>>> >>>>>> Thanks, >>>>>> Mike >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 26 10:24:26 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 26 Feb 2023 11:24:26 -0500 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: On Sun, Feb 26, 2023 at 11:19?AM Mike Michell wrote: > Which version of petsc you tested? With petsc 3.18.4, median duan volume > gives the same value with petsc from DMPlexComputeCellGeometryFVM(). > This is only an accident of the data layout. The code you sent writes over memory in the local Fortran arrays. Thanks, Matt > >> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell >> wrote: >> >>> My apologies for the late follow-up. There was a time conflict. >>> >>> A simple example code related to the issue I mentioned is attached here. >>> The sample code does: (1) load grid on dm, (2) compute vertex-wise control >>> volume for each node in a median-dual way, (3) halo exchange among procs to >>> have complete control volume values, and (4) print out its field as a .vtu >>> file. To make sure, the computed control volume is also compared with >>> PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see lines >>> 771-793). >>> >>> Back to the original problem, I can get a proper control volume field >>> with PETSc 3.18.4, which is the latest stable release. However, if I use >>> PETSc from the main repo, it gives a strange control volume field. >>> Something is certainly strange around the parallel boundaries, thus I think >>> something went wrong with halo communication. To help understand, a >>> comparing snapshot is also attached. I guess a certain part of my code is >>> no longer compatible with PETSc unless there is a bug in the library. Could >>> I get comments on it? >>> >> >> I can run your example. The numbers I get for "median-dual volume" do not >> match the "PETSc volume", and the PETSc volume is correct. Moreover, the >> median-dual numbers change, which suggests a memory fault. I compiled it >> using address sanitizer, and it found an error: >> >> Number of physical boundary edge ... 4 0 >> Number of physical and parallel boundary edge ... 4 >> 0 >> Number of parallel boundary edge ... 0 0 >> Number of physical boundary edge ... 4 1 >> Number of physical and parallel boundary edge ... 4 >> 1 >> Number of parallel boundary edge ... 0 1 >> ================================================================= >> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >> READ of size 8 at 0x603000022d40 thread T0 >> ================================================================= >> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >> READ of size 8 at 0x60300000f0f0 thread T0 >> #0 0x10cf702a7 in MAIN__ test.F90:657 >> #1 0x10cf768ee in main test.F90:43 >> #0 0x1068e12a7 in MAIN__ test.F90:657 >> #1 0x1068e78ee in main test.F90:43 >> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >> >> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >> [0x60300000f0d0,0x60300000f0f0) >> allocated by thread T0 here: >> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >> >> 0x603000022d40 is located 0 bytes to the right of 32-byte region >> [0x603000022d20,0x603000022d40) >> allocated by thread T0 here: >> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >> #1 0x1068dba71 in MAIN__ test.F90:499 >> #2 0x1068e78ee in main test.F90:43 >> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >> >> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ >> Shadow bytes around the buggy address: >> >> which corresponds to >> >> ! midpoint of median-dual face for inner face >> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >> >> and these were allocated here >> >> allocate(xc(ncell)) >> allocate(yc(ncell)) >> >> Hopefully the error is straightforward to see now. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Mike >>> >>> >>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell >>>>> wrote: >>>>> >>>>>> As a follow-up, I tested: >>>>>> >>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on >>>>>> DMPlex halo exchange. This version works as I expect. >>>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) >>>>>> has issues with DMPlex halo exchange. Something is suspicious about this >>>>>> main branch, related to DMPlex halo. The solution field I got is not >>>>>> correct. But it works okay with 1-proc. >>>>>> >>>>>> Does anyone have any comments on this issue? I am curious if other >>>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>> declare ghost layers for halo exchange. >>>>>> >>>>> >>>>> There should not have been any changes there and there are definitely >>>>> tests for this. >>>>> >>>>> It would be great if you could send something that failed. I could fix >>>>> it and add it as a test. >>>>> >>>> >>>> Just to follow up, we have tests of the low-level communication (Plex >>>> tests ex1, ex12, ex18, ex29, ex31), and then we have >>>> tests that use halo exchange for PDE calculations, for example SNES >>>> tutorial ex12, ex13, ex62. THe convergence rates >>>> should be off if the halo exchange were wrong. Is there any example >>>> similar to your code that is failing on your installation? >>>> Or is there a way to run your code? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks, >>>>>> Mike >>>>>> >>>>>> >>>>>>> Dear PETSc team, >>>>>>> >>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>>>>> version of PETSc: >>>>>>> >>git rev-parse origin >>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>> >>git rev-parse FETCH_HEAD >>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>> >>>>>>> There has been no issue, before the one with VTK viewer, which Jed >>>>>>> fixed today ( >>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>> ). >>>>>>> >>>>>>> Since that MR has been merged into the main repo, I pulled the >>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use >>>>>>> the same configure option with before, and run my code, then there is an >>>>>>> issue with halo exchange. The code runs without error message, but it gives >>>>>>> wrong solution field. I guess the issue I have is related to graph >>>>>>> partitioner or halo exchange part. This is because if I run the code with >>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and >>>>>>> there was no change in my own code. Could I get any comments on the issue? >>>>>>> I was wondering if there have been many changes in halo exchange or graph >>>>>>> partitioning & distributing part related to DMPlex. >>>>>>> >>>>>>> Thanks, >>>>>>> Mike >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Sun Feb 26 10:32:15 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sun, 26 Feb 2023 10:32:15 -0600 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: This is what I get from petsc main which is not correct: Overall volume computed from median-dual ... 6.37050098781844 Overall volume computed from PETSc ... 3.15470053800000 This is what I get from petsc 3.18.4 which is correct: Overall volume computed from median-dual ... 3.15470053800000 Overall volume computed from PETSc ... 3.15470053800000 If there is a problem in the code, it is also strange for me that petsc 3.18.4 gives the correct answer. Thanks, Mike > On Sun, Feb 26, 2023 at 11:19?AM Mike Michell > wrote: > >> Which version of petsc you tested? With petsc 3.18.4, median duan volume >> gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >> > > This is only an accident of the data layout. The code you sent writes over > memory in the local Fortran arrays. > > Thanks, > > Matt > > >> >>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell >>> wrote: >>> >>>> My apologies for the late follow-up. There was a time conflict. >>>> >>>> A simple example code related to the issue I mentioned is attached >>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise >>>> control volume for each node in a median-dual way, (3) halo exchange among >>>> procs to have complete control volume values, and (4) print out its field >>>> as a .vtu file. To make sure, the computed control volume is also compared >>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see >>>> lines 771-793). >>>> >>>> Back to the original problem, I can get a proper control volume field >>>> with PETSc 3.18.4, which is the latest stable release. However, if I use >>>> PETSc from the main repo, it gives a strange control volume field. >>>> Something is certainly strange around the parallel boundaries, thus I think >>>> something went wrong with halo communication. To help understand, a >>>> comparing snapshot is also attached. I guess a certain part of my code is >>>> no longer compatible with PETSc unless there is a bug in the library. Could >>>> I get comments on it? >>>> >>> >>> I can run your example. The numbers I get for "median-dual volume" do >>> not match the "PETSc volume", and the PETSc volume is correct. Moreover, >>> the median-dual numbers change, which suggests a memory fault. I compiled >>> it using address sanitizer, and it found an error: >>> >>> Number of physical boundary edge ... 4 0 >>> Number of physical and parallel boundary edge ... 4 >>> 0 >>> Number of parallel boundary edge ... 0 0 >>> Number of physical boundary edge ... 4 1 >>> Number of physical and parallel boundary edge ... 4 >>> 1 >>> Number of parallel boundary edge ... 0 1 >>> ================================================================= >>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>> READ of size 8 at 0x603000022d40 thread T0 >>> ================================================================= >>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>> READ of size 8 at 0x60300000f0f0 thread T0 >>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>> #1 0x10cf768ee in main test.F90:43 >>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>> #1 0x1068e78ee in main test.F90:43 >>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>> >>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >>> [0x60300000f0d0,0x60300000f0f0) >>> allocated by thread T0 here: >>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>> >>> 0x603000022d40 is located 0 bytes to the right of 32-byte region >>> [0x603000022d20,0x603000022d40) >>> allocated by thread T0 here: >>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>> #1 0x1068dba71 in MAIN__ test.F90:499 >>> #2 0x1068e78ee in main test.F90:43 >>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>> >>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ >>> Shadow bytes around the buggy address: >>> >>> which corresponds to >>> >>> ! midpoint of median-dual face for inner face >>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>> >>> and these were allocated here >>> >>> allocate(xc(ncell)) >>> allocate(yc(ncell)) >>> >>> Hopefully the error is straightforward to see now. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Mike >>>> >>>> >>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell >>>>>> wrote: >>>>>> >>>>>>> As a follow-up, I tested: >>>>>>> >>>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on >>>>>>> DMPlex halo exchange. This version works as I expect. >>>>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) >>>>>>> has issues with DMPlex halo exchange. Something is suspicious about this >>>>>>> main branch, related to DMPlex halo. The solution field I got is not >>>>>>> correct. But it works okay with 1-proc. >>>>>>> >>>>>>> Does anyone have any comments on this issue? I am curious if other >>>>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>>> declare ghost layers for halo exchange. >>>>>>> >>>>>> >>>>>> There should not have been any changes there and there are definitely >>>>>> tests for this. >>>>>> >>>>>> It would be great if you could send something that failed. I could >>>>>> fix it and add it as a test. >>>>>> >>>>> >>>>> Just to follow up, we have tests of the low-level communication (Plex >>>>> tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>> tests that use halo exchange for PDE calculations, for example SNES >>>>> tutorial ex12, ex13, ex62. THe convergence rates >>>>> should be off if the halo exchange were wrong. Is there any example >>>>> similar to your code that is failing on your installation? >>>>> Or is there a way to run your code? >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Mike >>>>>>> >>>>>>> >>>>>>>> Dear PETSc team, >>>>>>>> >>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>>>>>> version of PETSc: >>>>>>>> >>git rev-parse origin >>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>> >>>>>>>> There has been no issue, before the one with VTK viewer, which Jed >>>>>>>> fixed today ( >>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>>> ). >>>>>>>> >>>>>>>> Since that MR has been merged into the main repo, I pulled the >>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use >>>>>>>> the same configure option with before, and run my code, then there is an >>>>>>>> issue with halo exchange. The code runs without error message, but it gives >>>>>>>> wrong solution field. I guess the issue I have is related to graph >>>>>>>> partitioner or halo exchange part. This is because if I run the code with >>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and >>>>>>>> there was no change in my own code. Could I get any comments on the issue? >>>>>>>> I was wondering if there have been many changes in halo exchange or graph >>>>>>>> partitioning & distributing part related to DMPlex. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mike >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 26 10:37:41 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 26 Feb 2023 11:37:41 -0500 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: On Sun, Feb 26, 2023 at 11:32?AM Mike Michell wrote: > This is what I get from petsc main which is not correct: > Overall volume computed from median-dual ... > 6.37050098781844 > Overall volume computed from PETSc ... > 3.15470053800000 > > > This is what I get from petsc 3.18.4 which is correct: > Overall volume computed from median-dual ... > 3.15470053800000 > Overall volume computed from PETSc ... > 3.15470053800000 > > > If there is a problem in the code, it is also strange for me that petsc > 3.18.4 gives the correct answer > As I said, this can happen due to different layouts in memory. If you run it under valgrind, or address sanitizer, you will see that there is a problem. Thanks, Matt > Thanks, > Mike > > >> On Sun, Feb 26, 2023 at 11:19?AM Mike Michell >> wrote: >> >>> Which version of petsc you tested? With petsc 3.18.4, median duan volume >>> gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >>> >> >> This is only an accident of the data layout. The code you sent writes >> over memory in the local Fortran arrays. >> >> Thanks, >> >> Matt >> >> >>> >>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell >>>> wrote: >>>> >>>>> My apologies for the late follow-up. There was a time conflict. >>>>> >>>>> A simple example code related to the issue I mentioned is attached >>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise >>>>> control volume for each node in a median-dual way, (3) halo exchange among >>>>> procs to have complete control volume values, and (4) print out its field >>>>> as a .vtu file. To make sure, the computed control volume is also compared >>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see >>>>> lines 771-793). >>>>> >>>>> Back to the original problem, I can get a proper control volume field >>>>> with PETSc 3.18.4, which is the latest stable release. However, if I use >>>>> PETSc from the main repo, it gives a strange control volume field. >>>>> Something is certainly strange around the parallel boundaries, thus I think >>>>> something went wrong with halo communication. To help understand, a >>>>> comparing snapshot is also attached. I guess a certain part of my code is >>>>> no longer compatible with PETSc unless there is a bug in the library. Could >>>>> I get comments on it? >>>>> >>>> >>>> I can run your example. The numbers I get for "median-dual volume" do >>>> not match the "PETSc volume", and the PETSc volume is correct. Moreover, >>>> the median-dual numbers change, which suggests a memory fault. I compiled >>>> it using address sanitizer, and it found an error: >>>> >>>> Number of physical boundary edge ... 4 0 >>>> Number of physical and parallel boundary edge ... 4 >>>> 0 >>>> Number of parallel boundary edge ... 0 0 >>>> Number of physical boundary edge ... 4 1 >>>> Number of physical and parallel boundary edge ... 4 >>>> 1 >>>> Number of parallel boundary edge ... 0 1 >>>> ================================================================= >>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>>> READ of size 8 at 0x603000022d40 thread T0 >>>> ================================================================= >>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>>> READ of size 8 at 0x60300000f0f0 thread T0 >>>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>>> #1 0x10cf768ee in main test.F90:43 >>>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>>> #1 0x1068e78ee in main test.F90:43 >>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>> >>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >>>> [0x60300000f0d0,0x60300000f0f0) >>>> allocated by thread T0 here: >>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>> >>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region >>>> [0x603000022d20,0x603000022d40) >>>> allocated by thread T0 here: >>>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>>> #1 0x1068dba71 in MAIN__ test.F90:499 >>>> #2 0x1068e78ee in main test.F90:43 >>>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>> >>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ >>>> Shadow bytes around the buggy address: >>>> >>>> which corresponds to >>>> >>>> ! midpoint of median-dual face for inner face >>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>> >>>> and these were allocated here >>>> >>>> allocate(xc(ncell)) >>>> allocate(yc(ncell)) >>>> >>>> Hopefully the error is straightforward to see now. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Mike >>>>> >>>>> >>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell >>>>>>> wrote: >>>>>>> >>>>>>>> As a follow-up, I tested: >>>>>>>> >>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on >>>>>>>> DMPlex halo exchange. This version works as I expect. >>>>>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) >>>>>>>> has issues with DMPlex halo exchange. Something is suspicious about this >>>>>>>> main branch, related to DMPlex halo. The solution field I got is not >>>>>>>> correct. But it works okay with 1-proc. >>>>>>>> >>>>>>>> Does anyone have any comments on this issue? I am curious if other >>>>>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>>>> declare ghost layers for halo exchange. >>>>>>>> >>>>>>> >>>>>>> There should not have been any changes there and there are >>>>>>> definitely tests for this. >>>>>>> >>>>>>> It would be great if you could send something that failed. I could >>>>>>> fix it and add it as a test. >>>>>>> >>>>>> >>>>>> Just to follow up, we have tests of the low-level communication (Plex >>>>>> tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>>> tests that use halo exchange for PDE calculations, for example SNES >>>>>> tutorial ex12, ex13, ex62. THe convergence rates >>>>>> should be off if the halo exchange were wrong. Is there any example >>>>>> similar to your code that is failing on your installation? >>>>>> Or is there a way to run your code? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Mike >>>>>>>> >>>>>>>> >>>>>>>>> Dear PETSc team, >>>>>>>>> >>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>>>>>>> version of PETSc: >>>>>>>>> >>git rev-parse origin >>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>>> >>>>>>>>> There has been no issue, before the one with VTK viewer, which Jed >>>>>>>>> fixed today ( >>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>>>> ). >>>>>>>>> >>>>>>>>> Since that MR has been merged into the main repo, I pulled the >>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use >>>>>>>>> the same configure option with before, and run my code, then there is an >>>>>>>>> issue with halo exchange. The code runs without error message, but it gives >>>>>>>>> wrong solution field. I guess the issue I have is related to graph >>>>>>>>> partitioner or halo exchange part. This is because if I run the code with >>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and >>>>>>>>> there was no change in my own code. Could I get any comments on the issue? >>>>>>>>> I was wondering if there have been many changes in halo exchange or graph >>>>>>>>> partitioning & distributing part related to DMPlex. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mike >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Sun Feb 26 13:07:38 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sun, 26 Feb 2023 13:07:38 -0600 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: I cannot agree with this argument, unless you also tested with petsc 3.18.4 tarball from https://petsc.org/release/install/download/. If library has issue, it is trivial that I will see an error from my code. I ran my code with valgrind and see no error if it is with petsc 3.18.4. You can test with my code with valgrind or address sanitizer with this version of petsc-3.18.4.tar.gz from ( https://petsc.org/release/install/download/). I expect you see no error. Let me ask my question differently: Has any change been made on DMPlexMarkBoundaryFaces() recently? I found that the latest petsc does not recognize parallel (but not physical) boundary as boundary for distributed dm (line 235 of my example code). Because of this, you saw the error from the arrays: ! midpoint of median-dual face for inner face axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell and these were allocated here allocate(xc(ncell)) allocate(yc(ncell)) as you pointed out. Or any change made on distribution of dm over procs? Thanks, Mike > On Sun, Feb 26, 2023 at 11:32?AM Mike Michell > wrote: > >> This is what I get from petsc main which is not correct: >> Overall volume computed from median-dual ... >> 6.37050098781844 >> Overall volume computed from PETSc ... >> 3.15470053800000 >> >> >> This is what I get from petsc 3.18.4 which is correct: >> Overall volume computed from median-dual ... >> 3.15470053800000 >> Overall volume computed from PETSc ... >> 3.15470053800000 >> >> >> If there is a problem in the code, it is also strange for me that petsc >> 3.18.4 gives the correct answer >> > > As I said, this can happen due to different layouts in memory. If you run > it under valgrind, or address sanitizer, you will see > that there is a problem. > > Thanks, > > Matt > > >> Thanks, >> Mike >> >> >>> On Sun, Feb 26, 2023 at 11:19?AM Mike Michell >>> wrote: >>> >>>> Which version of petsc you tested? With petsc 3.18.4, median duan >>>> volume gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >>>> >>> >>> This is only an accident of the data layout. The code you sent writes >>> over memory in the local Fortran arrays. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> >>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell >>>>> wrote: >>>>> >>>>>> My apologies for the late follow-up. There was a time conflict. >>>>>> >>>>>> A simple example code related to the issue I mentioned is attached >>>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise >>>>>> control volume for each node in a median-dual way, (3) halo exchange among >>>>>> procs to have complete control volume values, and (4) print out its field >>>>>> as a .vtu file. To make sure, the computed control volume is also compared >>>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see >>>>>> lines 771-793). >>>>>> >>>>>> Back to the original problem, I can get a proper control volume field >>>>>> with PETSc 3.18.4, which is the latest stable release. However, if I use >>>>>> PETSc from the main repo, it gives a strange control volume field. >>>>>> Something is certainly strange around the parallel boundaries, thus I think >>>>>> something went wrong with halo communication. To help understand, a >>>>>> comparing snapshot is also attached. I guess a certain part of my code is >>>>>> no longer compatible with PETSc unless there is a bug in the library. Could >>>>>> I get comments on it? >>>>>> >>>>> >>>>> I can run your example. The numbers I get for "median-dual volume" do >>>>> not match the "PETSc volume", and the PETSc volume is correct. Moreover, >>>>> the median-dual numbers change, which suggests a memory fault. I compiled >>>>> it using address sanitizer, and it found an error: >>>>> >>>>> Number of physical boundary edge ... 4 0 >>>>> Number of physical and parallel boundary edge ... 4 >>>>> 0 >>>>> Number of parallel boundary edge ... 0 0 >>>>> Number of physical boundary edge ... 4 1 >>>>> Number of physical and parallel boundary edge ... 4 >>>>> 1 >>>>> Number of parallel boundary edge ... 0 1 >>>>> ================================================================= >>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>>>> READ of size 8 at 0x603000022d40 thread T0 >>>>> ================================================================= >>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>>>> READ of size 8 at 0x60300000f0f0 thread T0 >>>>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>>>> #1 0x10cf768ee in main test.F90:43 >>>>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>>>> #1 0x1068e78ee in main test.F90:43 >>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>> >>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >>>>> [0x60300000f0d0,0x60300000f0f0) >>>>> allocated by thread T0 here: >>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>> >>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region >>>>> [0x603000022d20,0x603000022d40) >>>>> allocated by thread T0 here: >>>>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>>>> #1 0x1068dba71 in MAIN__ test.F90:499 >>>>> #2 0x1068e78ee in main test.F90:43 >>>>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>> >>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ >>>>> Shadow bytes around the buggy address: >>>>> >>>>> which corresponds to >>>>> >>>>> ! midpoint of median-dual face for inner face >>>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>>> >>>>> and these were allocated here >>>>> >>>>> allocate(xc(ncell)) >>>>> allocate(yc(ncell)) >>>>> >>>>> Hopefully the error is straightforward to see now. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks, >>>>>> Mike >>>>>> >>>>>> >>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell < >>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>> >>>>>>>>> As a follow-up, I tested: >>>>>>>>> >>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on >>>>>>>>> DMPlex halo exchange. This version works as I expect. >>>>>>>>> (2) Clone main branch (git clone >>>>>>>>> https://gitlab.com/petsc/petsc.git) has issues with DMPlex halo >>>>>>>>> exchange. Something is suspicious about this main branch, related to DMPlex >>>>>>>>> halo. The solution field I got is not correct. But it works okay with >>>>>>>>> 1-proc. >>>>>>>>> >>>>>>>>> Does anyone have any comments on this issue? I am curious if other >>>>>>>>> DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>>>>> declare ghost layers for halo exchange. >>>>>>>>> >>>>>>>> >>>>>>>> There should not have been any changes there and there are >>>>>>>> definitely tests for this. >>>>>>>> >>>>>>>> It would be great if you could send something that failed. I could >>>>>>>> fix it and add it as a test. >>>>>>>> >>>>>>> >>>>>>> Just to follow up, we have tests of the low-level communication >>>>>>> (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>>>> tests that use halo exchange for PDE calculations, for example SNES >>>>>>> tutorial ex12, ex13, ex62. THe convergence rates >>>>>>> should be off if the halo exchange were wrong. Is there any example >>>>>>> similar to your code that is failing on your installation? >>>>>>> Or is there a way to run your code? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mike >>>>>>>>> >>>>>>>>> >>>>>>>>>> Dear PETSc team, >>>>>>>>>> >>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>>>>>>>> version of PETSc: >>>>>>>>>> >>git rev-parse origin >>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>>>> >>>>>>>>>> There has been no issue, before the one with VTK viewer, which >>>>>>>>>> Jed fixed today ( >>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>>>>> ). >>>>>>>>>> >>>>>>>>>> Since that MR has been merged into the main repo, I pulled the >>>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use >>>>>>>>>> the same configure option with before, and run my code, then there is an >>>>>>>>>> issue with halo exchange. The code runs without error message, but it gives >>>>>>>>>> wrong solution field. I guess the issue I have is related to graph >>>>>>>>>> partitioner or halo exchange part. This is because if I run the code with >>>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and >>>>>>>>>> there was no change in my own code. Could I get any comments on the issue? >>>>>>>>>> I was wondering if there have been many changes in halo exchange or graph >>>>>>>>>> partitioning & distributing part related to DMPlex. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mike >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Sun Feb 26 13:42:03 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Sun, 26 Feb 2023 20:42:03 +0100 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: <3CA599CB-2317-431C-98FB-3D5DB4E6C094@joliv.et> > On 26 Feb 2023, at 8:07 PM, Mike Michell wrote: > > I cannot agree with this argument, unless you also tested with petsc 3.18.4 tarball from https://petsc.org/release/install/download/. > If library has issue, it is trivial that I will see an error from my code. > > I ran my code with valgrind and see no error if it is with petsc 3.18.4. You can test with my code with valgrind or address sanitizer with this version of petsc-3.18.4.tar.gz from (https://petsc.org/release/install/download/). I expect you see no error. > > > Let me ask my question differently: > Has any change been made on DMPlexMarkBoundaryFaces() recently? Yes, and it will may break your application if you are not careful about it: https://gitlab.com/petsc/petsc/-/commit/a29bf4df3e5335fbd3b27b552a624c7f2a5a2f0a Thanks, Pierre > I found that the latest petsc does not recognize parallel (but not physical) boundary as boundary for distributed dm (line 235 of my example code). Because of this, you saw the error from the arrays: > > ! midpoint of median-dual face for inner face > axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell > axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell > > and these were allocated here > > allocate(xc(ncell)) > allocate(yc(ncell)) > > as you pointed out. Or any change made on distribution of dm over procs? > > Thanks, > Mike > >> >> On Sun, Feb 26, 2023 at 11:32?AM Mike Michell > wrote: >>> This is what I get from petsc main which is not correct: >>> Overall volume computed from median-dual ... >>> 6.37050098781844 >>> Overall volume computed from PETSc ... >>> 3.15470053800000 >>> >>> >>> This is what I get from petsc 3.18.4 which is correct: >>> Overall volume computed from median-dual ... >>> 3.15470053800000 >>> Overall volume computed from PETSc ... >>> 3.15470053800000 >>> >>> >>> If there is a problem in the code, it is also strange for me that petsc 3.18.4 gives the correct answer >> >> As I said, this can happen due to different layouts in memory. If you run it under valgrind, or address sanitizer, you will see >> that there is a problem. >> >> Thanks, >> >> Matt >> >>> Thanks, >>> Mike >>> >>>> >>>> On Sun, Feb 26, 2023 at 11:19?AM Mike Michell > wrote: >>>>> Which version of petsc you tested? With petsc 3.18.4, median duan volume gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >>>> >>>> This is only an accident of the data layout. The code you sent writes over memory in the local Fortran arrays. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>>>> >>>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell > wrote: >>>>>>> My apologies for the late follow-up. There was a time conflict. >>>>>>> >>>>>>> A simple example code related to the issue I mentioned is attached here. The sample code does: (1) load grid on dm, (2) compute vertex-wise control volume for each node in a median-dual way, (3) halo exchange among procs to have complete control volume values, and (4) print out its field as a .vtu file. To make sure, the computed control volume is also compared with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see lines 771-793). >>>>>>> >>>>>>> Back to the original problem, I can get a proper control volume field with PETSc 3.18.4, which is the latest stable release. However, if I use PETSc from the main repo, it gives a strange control volume field. Something is certainly strange around the parallel boundaries, thus I think something went wrong with halo communication. To help understand, a comparing snapshot is also attached. I guess a certain part of my code is no longer compatible with PETSc unless there is a bug in the library. Could I get comments on it? >>>>>> >>>>>> I can run your example. The numbers I get for "median-dual volume" do not match the "PETSc volume", and the PETSc volume is correct. Moreover, the median-dual numbers change, which suggests a memory fault. I compiled it using address sanitizer, and it found an error: >>>>>> >>>>>> Number of physical boundary edge ... 4 0 >>>>>> Number of physical and parallel boundary edge ... 4 0 >>>>>> Number of parallel boundary edge ... 0 0 >>>>>> Number of physical boundary edge ... 4 1 >>>>>> Number of physical and parallel boundary edge ... 4 1 >>>>>> Number of parallel boundary edge ... 0 1 >>>>>> ================================================================= >>>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>>>>> READ of size 8 at 0x603000022d40 thread T0 >>>>>> ================================================================= >>>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>>>>> READ of size 8 at 0x60300000f0f0 thread T0 >>>>>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>>>>> #1 0x10cf768ee in main test.F90:43 >>>>>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>>>>> #1 0x1068e78ee in main test.F90:43 >>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>> >>>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region [0x60300000f0d0,0x60300000f0f0) >>>>>> allocated by thread T0 here: >>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>> >>>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region [0x603000022d20,0x603000022d40) >>>>>> allocated by thread T0 here: >>>>>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>>>>> #1 0x1068dba71 in MAIN__ test.F90:499 >>>>>> #2 0x1068e78ee in main test.F90:43 >>>>>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>> >>>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ >>>>>> Shadow bytes around the buggy address: >>>>>> >>>>>> which corresponds to >>>>>> >>>>>> ! midpoint of median-dual face for inner face >>>>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>>>> >>>>>> and these were allocated here >>>>>> >>>>>> allocate(xc(ncell)) >>>>>> allocate(yc(ncell)) >>>>>> >>>>>> Hopefully the error is straightforward to see now. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>>> Thanks, >>>>>>> Mike >>>>>>> >>>>>>>> >>>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley > wrote: >>>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell > wrote: >>>>>>>>>> As a follow-up, I tested: >>>>>>>>>> >>>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab (https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on DMPlex halo exchange. This version works as I expect. >>>>>>>>>> (2) Clone main branch (git clone https://gitlab.com/petsc/petsc.git) has issues with DMPlex halo exchange. Something is suspicious about this main branch, related to DMPlex halo. The solution field I got is not correct. But it works okay with 1-proc. >>>>>>>>>> >>>>>>>>>> Does anyone have any comments on this issue? I am curious if other DMPlex users have no problem regarding halo exchange. FYI, I do not declare ghost layers for halo exchange. >>>>>>>>> >>>>>>>>> There should not have been any changes there and there are definitely tests for this. >>>>>>>>> >>>>>>>>> It would be great if you could send something that failed. I could fix it and add it as a test. >>>>>>>> >>>>>>>> Just to follow up, we have tests of the low-level communication (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>>>>> tests that use halo exchange for PDE calculations, for example SNES tutorial ex12, ex13, ex62. THe convergence rates >>>>>>>> should be off if the halo exchange were wrong. Is there any example similar to your code that is failing on your installation? >>>>>>>> Or is there a way to run your code? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mike >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Dear PETSc team, >>>>>>>>>>> >>>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this version of PETSc: >>>>>>>>>>> >>git rev-parse origin >>>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>>>>> >>>>>>>>>>> There has been no issue, before the one with VTK viewer, which Jed fixed today (https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735). >>>>>>>>>>> >>>>>>>>>>> Since that MR has been merged into the main repo, I pulled the latest version of PETSc (basically I cloned it from scratch). But if I use the same configure option with before, and run my code, then there is an issue with halo exchange. The code runs without error message, but it gives wrong solution field. I guess the issue I have is related to graph partitioner or halo exchange part. This is because if I run the code with 1-proc, the solution is correct. I only updated the version of PETSc and there was no change in my own code. Could I get any comments on the issue? I was wondering if there have been many changes in halo exchange or graph partitioning & distributing part related to DMPlex. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mike >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 26 14:04:55 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 26 Feb 2023 15:04:55 -0500 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: On Sun, Feb 26, 2023 at 2:07?PM Mike Michell wrote: > I cannot agree with this argument, unless you also tested with petsc > 3.18.4 tarball from https://petsc.org/release/install/download/. > If library has issue, it is trivial that I will see an error from my code. > > I ran my code with valgrind and see no error if it is with petsc 3.18.4. > You can test with my code with valgrind or address sanitizer with this > version of petsc-3.18.4.tar.gz from ( > https://petsc.org/release/install/download/). I expect you see no error. > > > Let me ask my question differently: > Has any change been made on DMPlexMarkBoundaryFaces() recently? I found > that the latest petsc does not recognize parallel (but not physical) > boundary as boundary for distributed dm (line 235 of my example code). > Because of this, you saw the error from the arrays: > The behavior of DMPlexMarkBoundaryFaces() was changed 3 months ago: https://gitlab.com/petsc/petsc/-/commit/429fa399fc3cd6fd42f3ca9697415d505b9dce5d I did update the documentation for that function Note: This function will use the point `PetscSF` from the input `DM` to exclude points on the partition boundary from being marked, unless the partition overlap is greater than zero. If you also wish to mark the partition boundary, you can use `DMSetPointSF()` to temporarily set it to NULL, and then reset it to the original object after the call. The reason is that if you call it in parallel, it is no longer suitable for applying boundary conditions. If you want to restore the prior behavior, you can use: { PetscSF sf; PetscCall(DMGetPointSF(dm, &sf)); PetscCall(DMSetPointSF(dm, NULL)); PetscCall(DMPlexMarkBoundaryFaces(dm, val, label)); PetscCall(DMSetPointSF(dm, sf)); } Thanks, Matt ! midpoint of median-dual face for inner face > axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell > axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell > > and these were allocated here > > allocate(xc(ncell)) > allocate(yc(ncell)) > > as you pointed out. Or any change made on distribution of dm over procs? > > Thanks, > Mike > > >> On Sun, Feb 26, 2023 at 11:32?AM Mike Michell >> wrote: >> >>> This is what I get from petsc main which is not correct: >>> Overall volume computed from median-dual ... >>> 6.37050098781844 >>> Overall volume computed from PETSc ... >>> 3.15470053800000 >>> >>> >>> This is what I get from petsc 3.18.4 which is correct: >>> Overall volume computed from median-dual ... >>> 3.15470053800000 >>> Overall volume computed from PETSc ... >>> 3.15470053800000 >>> >>> >>> If there is a problem in the code, it is also strange for me that petsc >>> 3.18.4 gives the correct answer >>> >> >> As I said, this can happen due to different layouts in memory. If you run >> it under valgrind, or address sanitizer, you will see >> that there is a problem. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Mike >>> >>> >>>> On Sun, Feb 26, 2023 at 11:19?AM Mike Michell >>>> wrote: >>>> >>>>> Which version of petsc you tested? With petsc 3.18.4, median duan >>>>> volume gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >>>>> >>>> >>>> This is only an accident of the data layout. The code you sent writes >>>> over memory in the local Fortran arrays. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> >>>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell >>>>>> wrote: >>>>>> >>>>>>> My apologies for the late follow-up. There was a time conflict. >>>>>>> >>>>>>> A simple example code related to the issue I mentioned is attached >>>>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise >>>>>>> control volume for each node in a median-dual way, (3) halo exchange among >>>>>>> procs to have complete control volume values, and (4) print out its field >>>>>>> as a .vtu file. To make sure, the computed control volume is also compared >>>>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see >>>>>>> lines 771-793). >>>>>>> >>>>>>> Back to the original problem, I can get a proper control volume >>>>>>> field with PETSc 3.18.4, which is the latest stable release. However, if I >>>>>>> use PETSc from the main repo, it gives a strange control volume field. >>>>>>> Something is certainly strange around the parallel boundaries, thus I think >>>>>>> something went wrong with halo communication. To help understand, a >>>>>>> comparing snapshot is also attached. I guess a certain part of my code is >>>>>>> no longer compatible with PETSc unless there is a bug in the library. Could >>>>>>> I get comments on it? >>>>>>> >>>>>> >>>>>> I can run your example. The numbers I get for "median-dual volume" do >>>>>> not match the "PETSc volume", and the PETSc volume is correct. Moreover, >>>>>> the median-dual numbers change, which suggests a memory fault. I compiled >>>>>> it using address sanitizer, and it found an error: >>>>>> >>>>>> Number of physical boundary edge ... 4 0 >>>>>> Number of physical and parallel boundary edge ... 4 >>>>>> 0 >>>>>> Number of parallel boundary edge ... 0 0 >>>>>> Number of physical boundary edge ... 4 1 >>>>>> Number of physical and parallel boundary edge ... 4 >>>>>> 1 >>>>>> Number of parallel boundary edge ... 0 1 >>>>>> ================================================================= >>>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>>>>> READ of size 8 at 0x603000022d40 thread T0 >>>>>> ================================================================= >>>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>>>>> READ of size 8 at 0x60300000f0f0 thread T0 >>>>>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>>>>> #1 0x10cf768ee in main test.F90:43 >>>>>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>>>>> #1 0x1068e78ee in main test.F90:43 >>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>> >>>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >>>>>> [0x60300000f0d0,0x60300000f0f0) >>>>>> allocated by thread T0 here: >>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>> >>>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region >>>>>> [0x603000022d20,0x603000022d40) >>>>>> allocated by thread T0 here: >>>>>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>>>>> #1 0x1068dba71 in MAIN__ test.F90:499 >>>>>> #2 0x1068e78ee in main test.F90:43 >>>>>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>> >>>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in MAIN__ >>>>>> Shadow bytes around the buggy address: >>>>>> >>>>>> which corresponds to >>>>>> >>>>>> ! midpoint of median-dual face for inner face >>>>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>>>> >>>>>> and these were allocated here >>>>>> >>>>>> allocate(xc(ncell)) >>>>>> allocate(yc(ncell)) >>>>>> >>>>>> Hopefully the error is straightforward to see now. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Mike >>>>>>> >>>>>>> >>>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell < >>>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> As a follow-up, I tested: >>>>>>>>>> >>>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on >>>>>>>>>> DMPlex halo exchange. This version works as I expect. >>>>>>>>>> (2) Clone main branch (git clone >>>>>>>>>> https://gitlab.com/petsc/petsc.git) has issues with DMPlex halo >>>>>>>>>> exchange. Something is suspicious about this main branch, related to DMPlex >>>>>>>>>> halo. The solution field I got is not correct. But it works okay with >>>>>>>>>> 1-proc. >>>>>>>>>> >>>>>>>>>> Does anyone have any comments on this issue? I am curious if >>>>>>>>>> other DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>>>>>> declare ghost layers for halo exchange. >>>>>>>>>> >>>>>>>>> >>>>>>>>> There should not have been any changes there and there are >>>>>>>>> definitely tests for this. >>>>>>>>> >>>>>>>>> It would be great if you could send something that failed. I could >>>>>>>>> fix it and add it as a test. >>>>>>>>> >>>>>>>> >>>>>>>> Just to follow up, we have tests of the low-level communication >>>>>>>> (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>>>>> tests that use halo exchange for PDE calculations, for example SNES >>>>>>>> tutorial ex12, ex13, ex62. THe convergence rates >>>>>>>> should be off if the halo exchange were wrong. Is there any example >>>>>>>> similar to your code that is failing on your installation? >>>>>>>> Or is there a way to run your code? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mike >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Dear PETSc team, >>>>>>>>>>> >>>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using this >>>>>>>>>>> version of PETSc: >>>>>>>>>>> >>git rev-parse origin >>>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>>>>> >>>>>>>>>>> There has been no issue, before the one with VTK viewer, which >>>>>>>>>>> Jed fixed today ( >>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>>>>>> ). >>>>>>>>>>> >>>>>>>>>>> Since that MR has been merged into the main repo, I pulled the >>>>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use >>>>>>>>>>> the same configure option with before, and run my code, then there is an >>>>>>>>>>> issue with halo exchange. The code runs without error message, but it gives >>>>>>>>>>> wrong solution field. I guess the issue I have is related to graph >>>>>>>>>>> partitioner or halo exchange part. This is because if I run the code with >>>>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and >>>>>>>>>>> there was no change in my own code. Could I get any comments on the issue? >>>>>>>>>>> I was wondering if there have been many changes in halo exchange or graph >>>>>>>>>>> partitioning & distributing part related to DMPlex. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mike >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Sun Feb 26 17:56:09 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sun, 26 Feb 2023 17:56:09 -0600 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: Okay that was the part of the code incompatible with the latest petsc. Thank you for support. I also need to call DMPlexLabelComplete() for the vertices on the parallel and physical boundaries, but I get an error if DMPlexLabelComplete() is called before DMSetPointSF(dm, sf). So I do: { PetscSF sf; PetscCall(DMGetPointSF(dm, &sf)); PetscCall(DMSetPointSF(dm, NULL)); PetscCall(DMPlexMarkBoundaryFaces(dm, val, label)); PetscCall(DMSetPointSF(dm, sf)); PetscCall(DMPlexLabelComplete(dm, label)); } I believe that this flow is okay since label is already marked on parallel boundary. But could you please confirm that calling DMPlexLabelComplete() after DMSetPointSF(dm, sf) does not create problems to mark vertices on parallel boundary? The solution I get from the code with the above flow is now correct and has no errors with latest petsc, but I want to double check. Thanks, Mike > On Sun, Feb 26, 2023 at 2:07?PM Mike Michell > wrote: > >> I cannot agree with this argument, unless you also tested with petsc >> 3.18.4 tarball from https://petsc.org/release/install/download/. >> If library has issue, it is trivial that I will see an error from my >> code. >> >> I ran my code with valgrind and see no error if it is with petsc 3.18.4. >> You can test with my code with valgrind or address sanitizer with this >> version of petsc-3.18.4.tar.gz from ( >> https://petsc.org/release/install/download/). I expect you see no error. >> >> >> Let me ask my question differently: >> Has any change been made on DMPlexMarkBoundaryFaces() recently? I found >> that the latest petsc does not recognize parallel (but not physical) >> boundary as boundary for distributed dm (line 235 of my example code). >> Because of this, you saw the error from the arrays: >> > > The behavior of DMPlexMarkBoundaryFaces() was changed 3 months ago: > > > https://gitlab.com/petsc/petsc/-/commit/429fa399fc3cd6fd42f3ca9697415d505b9dce5d > > I did update the documentation for that function > > Note: > This function will use the point `PetscSF` from the input `DM` to > exclude points on the partition boundary from being marked, unless the > partition overlap is greater than zero. If you also wish to mark the > partition boundary, you can use `DMSetPointSF()` to temporarily set it to > NULL, and then reset it to the original object after the call. > > The reason is that if you call it in parallel, it is no longer suitable > for applying boundary conditions. If you want to restore the prior behavior, > you can use: > > { > PetscSF sf; > > PetscCall(DMGetPointSF(dm, &sf)); > PetscCall(DMSetPointSF(dm, NULL)); > PetscCall(DMPlexMarkBoundaryFaces(dm, val, label)); > PetscCall(DMSetPointSF(dm, sf)); > } > > Thanks, > > Matt > > ! midpoint of median-dual face for inner face >> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >> >> and these were allocated here >> >> allocate(xc(ncell)) >> allocate(yc(ncell)) >> >> as you pointed out. Or any change made on distribution of dm over procs? >> >> Thanks, >> Mike >> >> >>> On Sun, Feb 26, 2023 at 11:32?AM Mike Michell >>> wrote: >>> >>>> This is what I get from petsc main which is not correct: >>>> Overall volume computed from median-dual ... >>>> 6.37050098781844 >>>> Overall volume computed from PETSc ... >>>> 3.15470053800000 >>>> >>>> >>>> This is what I get from petsc 3.18.4 which is correct: >>>> Overall volume computed from median-dual ... >>>> 3.15470053800000 >>>> Overall volume computed from PETSc ... >>>> 3.15470053800000 >>>> >>>> >>>> If there is a problem in the code, it is also strange for me that petsc >>>> 3.18.4 gives the correct answer >>>> >>> >>> As I said, this can happen due to different layouts in memory. If you >>> run it under valgrind, or address sanitizer, you will see >>> that there is a problem. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Mike >>>> >>>> >>>>> On Sun, Feb 26, 2023 at 11:19?AM Mike Michell >>>>> wrote: >>>>> >>>>>> Which version of petsc you tested? With petsc 3.18.4, median duan >>>>>> volume gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >>>>>> >>>>> >>>>> This is only an accident of the data layout. The code you sent writes >>>>> over memory in the local Fortran arrays. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> >>>>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell >>>>>>> wrote: >>>>>>> >>>>>>>> My apologies for the late follow-up. There was a time conflict. >>>>>>>> >>>>>>>> A simple example code related to the issue I mentioned is attached >>>>>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise >>>>>>>> control volume for each node in a median-dual way, (3) halo exchange among >>>>>>>> procs to have complete control volume values, and (4) print out its field >>>>>>>> as a .vtu file. To make sure, the computed control volume is also compared >>>>>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see >>>>>>>> lines 771-793). >>>>>>>> >>>>>>>> Back to the original problem, I can get a proper control volume >>>>>>>> field with PETSc 3.18.4, which is the latest stable release. However, if I >>>>>>>> use PETSc from the main repo, it gives a strange control volume field. >>>>>>>> Something is certainly strange around the parallel boundaries, thus I think >>>>>>>> something went wrong with halo communication. To help understand, a >>>>>>>> comparing snapshot is also attached. I guess a certain part of my code is >>>>>>>> no longer compatible with PETSc unless there is a bug in the library. Could >>>>>>>> I get comments on it? >>>>>>>> >>>>>>> >>>>>>> I can run your example. The numbers I get for "median-dual volume" >>>>>>> do not match the "PETSc volume", and the PETSc volume is correct. Moreover, >>>>>>> the median-dual numbers change, which suggests a memory fault. I compiled >>>>>>> it using address sanitizer, and it found an error: >>>>>>> >>>>>>> Number of physical boundary edge ... 4 0 >>>>>>> Number of physical and parallel boundary edge ... 4 >>>>>>> 0 >>>>>>> Number of parallel boundary edge ... 0 0 >>>>>>> Number of physical boundary edge ... 4 1 >>>>>>> Number of physical and parallel boundary edge ... 4 >>>>>>> 1 >>>>>>> Number of parallel boundary edge ... 0 1 >>>>>>> ================================================================= >>>>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>>>>>> READ of size 8 at 0x603000022d40 thread T0 >>>>>>> ================================================================= >>>>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>>>>>> READ of size 8 at 0x60300000f0f0 thread T0 >>>>>>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>>>>>> #1 0x10cf768ee in main test.F90:43 >>>>>>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>>>>>> #1 0x1068e78ee in main test.F90:43 >>>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>> >>>>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >>>>>>> [0x60300000f0d0,0x60300000f0f0) >>>>>>> allocated by thread T0 here: >>>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>> >>>>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region >>>>>>> [0x603000022d20,0x603000022d40) >>>>>>> allocated by thread T0 here: >>>>>>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>>>>>> #1 0x1068dba71 in MAIN__ test.F90:499 >>>>>>> #2 0x1068e78ee in main test.F90:43 >>>>>>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>> >>>>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in >>>>>>> MAIN__ >>>>>>> Shadow bytes around the buggy address: >>>>>>> >>>>>>> which corresponds to >>>>>>> >>>>>>> ! midpoint of median-dual face for inner face >>>>>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>>>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>>>>> >>>>>>> and these were allocated here >>>>>>> >>>>>>> allocate(xc(ncell)) >>>>>>> allocate(yc(ncell)) >>>>>>> >>>>>>> Hopefully the error is straightforward to see now. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Mike >>>>>>>> >>>>>>>> >>>>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley < >>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell < >>>>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> As a follow-up, I tested: >>>>>>>>>>> >>>>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on >>>>>>>>>>> DMPlex halo exchange. This version works as I expect. >>>>>>>>>>> (2) Clone main branch (git clone >>>>>>>>>>> https://gitlab.com/petsc/petsc.git) has issues with DMPlex halo >>>>>>>>>>> exchange. Something is suspicious about this main branch, related to DMPlex >>>>>>>>>>> halo. The solution field I got is not correct. But it works okay with >>>>>>>>>>> 1-proc. >>>>>>>>>>> >>>>>>>>>>> Does anyone have any comments on this issue? I am curious if >>>>>>>>>>> other DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>>>>>>> declare ghost layers for halo exchange. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> There should not have been any changes there and there are >>>>>>>>>> definitely tests for this. >>>>>>>>>> >>>>>>>>>> It would be great if you could send something that failed. I >>>>>>>>>> could fix it and add it as a test. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Just to follow up, we have tests of the low-level communication >>>>>>>>> (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>>>>>> tests that use halo exchange for PDE calculations, for example >>>>>>>>> SNES tutorial ex12, ex13, ex62. THe convergence rates >>>>>>>>> should be off if the halo exchange were wrong. Is there any >>>>>>>>> example similar to your code that is failing on your installation? >>>>>>>>> Or is there a way to run your code? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mike >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>> >>>>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using >>>>>>>>>>>> this version of PETSc: >>>>>>>>>>>> >>git rev-parse origin >>>>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>>>>>> >>>>>>>>>>>> There has been no issue, before the one with VTK viewer, which >>>>>>>>>>>> Jed fixed today ( >>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>>>>>>> ). >>>>>>>>>>>> >>>>>>>>>>>> Since that MR has been merged into the main repo, I pulled the >>>>>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use >>>>>>>>>>>> the same configure option with before, and run my code, then there is an >>>>>>>>>>>> issue with halo exchange. The code runs without error message, but it gives >>>>>>>>>>>> wrong solution field. I guess the issue I have is related to graph >>>>>>>>>>>> partitioner or halo exchange part. This is because if I run the code with >>>>>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and >>>>>>>>>>>> there was no change in my own code. Could I get any comments on the issue? >>>>>>>>>>>> I was wondering if there have been many changes in halo exchange or graph >>>>>>>>>>>> partitioning & distributing part related to DMPlex. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Mike >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>> experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Feb 26 18:15:28 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 26 Feb 2023 19:15:28 -0500 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: On Sun, Feb 26, 2023 at 6:56?PM Mike Michell wrote: > Okay that was the part of the code incompatible with the latest petsc. > Thank you for support. > I also need to call DMPlexLabelComplete() for the vertices on the parallel > and physical boundaries, but I get an error if DMPlexLabelComplete() is > called before DMSetPointSF(dm, sf). > > So I do: > { > PetscSF sf; > > PetscCall(DMGetPointSF(dm, &sf)); > PetscCall(DMSetPointSF(dm, NULL)); > PetscCall(DMPlexMarkBoundaryFaces(dm, val, label)); > PetscCall(DMSetPointSF(dm, sf)); > PetscCall(DMPlexLabelComplete(dm, label)); > } > > I believe that this flow is okay since label is already marked on parallel > boundary. But could you please confirm that calling DMPlexLabelComplete() > after DMSetPointSF(dm, sf) does not create problems to mark vertices on > parallel boundary? > Yes. The idea is the following. A DMPlex is a collection of serial meshes. When an SF is specified, this identifies some mesh points on one process with those on another. There are no limits to the identification, so you can for instance have overlap. > The solution I get from the code with the above flow is now correct and > has no errors with latest petsc, but I want to double check. > Excellent. I think it might be possible to do what you want in less code. If sometime this is something that you want to pursue, please send me an overview mail about the calculations you are doing. Thanks, Matt > Thanks, > Mike > > >> On Sun, Feb 26, 2023 at 2:07?PM Mike Michell >> wrote: >> >>> I cannot agree with this argument, unless you also tested with petsc >>> 3.18.4 tarball from https://petsc.org/release/install/download/. >>> If library has issue, it is trivial that I will see an error from my >>> code. >>> >>> I ran my code with valgrind and see no error if it is with petsc 3.18.4. >>> You can test with my code with valgrind or address sanitizer with this >>> version of petsc-3.18.4.tar.gz from ( >>> https://petsc.org/release/install/download/). I expect you see no error. >>> >>> >>> Let me ask my question differently: >>> Has any change been made on DMPlexMarkBoundaryFaces() recently? I found >>> that the latest petsc does not recognize parallel (but not physical) >>> boundary as boundary for distributed dm (line 235 of my example code). >>> Because of this, you saw the error from the arrays: >>> >> >> The behavior of DMPlexMarkBoundaryFaces() was changed 3 months ago: >> >> >> https://gitlab.com/petsc/petsc/-/commit/429fa399fc3cd6fd42f3ca9697415d505b9dce5d >> >> I did update the documentation for that function >> >> Note: >> This function will use the point `PetscSF` from the input `DM` to >> exclude points on the partition boundary from being marked, unless the >> partition overlap is greater than zero. If you also wish to mark the >> partition boundary, you can use `DMSetPointSF()` to temporarily set it to >> NULL, and then reset it to the original object after the call. >> >> The reason is that if you call it in parallel, it is no longer suitable >> for applying boundary conditions. If you want to restore the prior behavior, >> you can use: >> >> { >> PetscSF sf; >> >> PetscCall(DMGetPointSF(dm, &sf)); >> PetscCall(DMSetPointSF(dm, NULL)); >> PetscCall(DMPlexMarkBoundaryFaces(dm, val, label)); >> PetscCall(DMSetPointSF(dm, sf)); >> } >> >> Thanks, >> >> Matt >> >> ! midpoint of median-dual face for inner face >>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>> >>> and these were allocated here >>> >>> allocate(xc(ncell)) >>> allocate(yc(ncell)) >>> >>> as you pointed out. Or any change made on distribution of dm over procs? >>> >>> Thanks, >>> Mike >>> >>> >>>> On Sun, Feb 26, 2023 at 11:32?AM Mike Michell >>>> wrote: >>>> >>>>> This is what I get from petsc main which is not correct: >>>>> Overall volume computed from median-dual ... >>>>> 6.37050098781844 >>>>> Overall volume computed from PETSc ... >>>>> 3.15470053800000 >>>>> >>>>> >>>>> This is what I get from petsc 3.18.4 which is correct: >>>>> Overall volume computed from median-dual ... >>>>> 3.15470053800000 >>>>> Overall volume computed from PETSc ... >>>>> 3.15470053800000 >>>>> >>>>> >>>>> If there is a problem in the code, it is also strange for me that >>>>> petsc 3.18.4 gives the correct answer >>>>> >>>> >>>> As I said, this can happen due to different layouts in memory. If you >>>> run it under valgrind, or address sanitizer, you will see >>>> that there is a problem. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Mike >>>>> >>>>> >>>>>> On Sun, Feb 26, 2023 at 11:19?AM Mike Michell >>>>>> wrote: >>>>>> >>>>>>> Which version of petsc you tested? With petsc 3.18.4, median duan >>>>>>> volume gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >>>>>>> >>>>>> >>>>>> This is only an accident of the data layout. The code you sent writes >>>>>> over memory in the local Fortran arrays. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> >>>>>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell >>>>>>>> wrote: >>>>>>>> >>>>>>>>> My apologies for the late follow-up. There was a time conflict. >>>>>>>>> >>>>>>>>> A simple example code related to the issue I mentioned is attached >>>>>>>>> here. The sample code does: (1) load grid on dm, (2) compute vertex-wise >>>>>>>>> control volume for each node in a median-dual way, (3) halo exchange among >>>>>>>>> procs to have complete control volume values, and (4) print out its field >>>>>>>>> as a .vtu file. To make sure, the computed control volume is also compared >>>>>>>>> with PETSc-computed control volume via DMPlexComputeCellGeometryFVM() (see >>>>>>>>> lines 771-793). >>>>>>>>> >>>>>>>>> Back to the original problem, I can get a proper control volume >>>>>>>>> field with PETSc 3.18.4, which is the latest stable release. However, if I >>>>>>>>> use PETSc from the main repo, it gives a strange control volume field. >>>>>>>>> Something is certainly strange around the parallel boundaries, thus I think >>>>>>>>> something went wrong with halo communication. To help understand, a >>>>>>>>> comparing snapshot is also attached. I guess a certain part of my code is >>>>>>>>> no longer compatible with PETSc unless there is a bug in the library. Could >>>>>>>>> I get comments on it? >>>>>>>>> >>>>>>>> >>>>>>>> I can run your example. The numbers I get for "median-dual volume" >>>>>>>> do not match the "PETSc volume", and the PETSc volume is correct. Moreover, >>>>>>>> the median-dual numbers change, which suggests a memory fault. I compiled >>>>>>>> it using address sanitizer, and it found an error: >>>>>>>> >>>>>>>> Number of physical boundary edge ... 4 0 >>>>>>>> Number of physical and parallel boundary edge ... 4 >>>>>>>> 0 >>>>>>>> Number of parallel boundary edge ... 0 0 >>>>>>>> Number of physical boundary edge ... 4 1 >>>>>>>> Number of physical and parallel boundary edge ... 4 >>>>>>>> 1 >>>>>>>> Number of parallel boundary edge ... 0 1 >>>>>>>> ================================================================= >>>>>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>>>>>>> READ of size 8 at 0x603000022d40 thread T0 >>>>>>>> ================================================================= >>>>>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>>>>>>> READ of size 8 at 0x60300000f0f0 thread T0 >>>>>>>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>>>>>>> #1 0x10cf768ee in main test.F90:43 >>>>>>>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>>>>>>> #1 0x1068e78ee in main test.F90:43 >>>>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>>> >>>>>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >>>>>>>> [0x60300000f0d0,0x60300000f0f0) >>>>>>>> allocated by thread T0 here: >>>>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>>> >>>>>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region >>>>>>>> [0x603000022d20,0x603000022d40) >>>>>>>> allocated by thread T0 here: >>>>>>>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>>>>>>> #1 0x1068dba71 in MAIN__ test.F90:499 >>>>>>>> #2 0x1068e78ee in main test.F90:43 >>>>>>>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>>> >>>>>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in >>>>>>>> MAIN__ >>>>>>>> Shadow bytes around the buggy address: >>>>>>>> >>>>>>>> which corresponds to >>>>>>>> >>>>>>>> ! midpoint of median-dual face for inner face >>>>>>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>>>>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>>>>>> >>>>>>>> and these were allocated here >>>>>>>> >>>>>>>> allocate(xc(ncell)) >>>>>>>> allocate(yc(ncell)) >>>>>>>> >>>>>>>> Hopefully the error is straightforward to see now. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mike >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley < >>>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell < >>>>>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> As a follow-up, I tested: >>>>>>>>>>>> >>>>>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue on >>>>>>>>>>>> DMPlex halo exchange. This version works as I expect. >>>>>>>>>>>> (2) Clone main branch (git clone >>>>>>>>>>>> https://gitlab.com/petsc/petsc.git) has issues with DMPlex >>>>>>>>>>>> halo exchange. Something is suspicious about this main branch, related to >>>>>>>>>>>> DMPlex halo. The solution field I got is not correct. But it works okay >>>>>>>>>>>> with 1-proc. >>>>>>>>>>>> >>>>>>>>>>>> Does anyone have any comments on this issue? I am curious if >>>>>>>>>>>> other DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>>>>>>>> declare ghost layers for halo exchange. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> There should not have been any changes there and there are >>>>>>>>>>> definitely tests for this. >>>>>>>>>>> >>>>>>>>>>> It would be great if you could send something that failed. I >>>>>>>>>>> could fix it and add it as a test. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Just to follow up, we have tests of the low-level communication >>>>>>>>>> (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>>>>>>> tests that use halo exchange for PDE calculations, for example >>>>>>>>>> SNES tutorial ex12, ex13, ex62. THe convergence rates >>>>>>>>>> should be off if the halo exchange were wrong. Is there any >>>>>>>>>> example similar to your code that is failing on your installation? >>>>>>>>>> Or is there a way to run your code? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Matt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Mike >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>>> >>>>>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using >>>>>>>>>>>>> this version of PETSc: >>>>>>>>>>>>> >>git rev-parse origin >>>>>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>>>>>>> >>>>>>>>>>>>> There has been no issue, before the one with VTK viewer, which >>>>>>>>>>>>> Jed fixed today ( >>>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>>>>>>>> ). >>>>>>>>>>>>> >>>>>>>>>>>>> Since that MR has been merged into the main repo, I pulled the >>>>>>>>>>>>> latest version of PETSc (basically I cloned it from scratch). But if I use >>>>>>>>>>>>> the same configure option with before, and run my code, then there is an >>>>>>>>>>>>> issue with halo exchange. The code runs without error message, but it gives >>>>>>>>>>>>> wrong solution field. I guess the issue I have is related to graph >>>>>>>>>>>>> partitioner or halo exchange part. This is because if I run the code with >>>>>>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and >>>>>>>>>>>>> there was no change in my own code. Could I get any comments on the issue? >>>>>>>>>>>>> I was wondering if there have been many changes in halo exchange or graph >>>>>>>>>>>>> partitioning & distributing part related to DMPlex. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Mike >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>>> experiments lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>> experiments lead. >>>>>>>>>> -- Norbert Wiener >>>>>>>>>> >>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mi.mike1021 at gmail.com Sun Feb 26 18:30:41 2023 From: mi.mike1021 at gmail.com (Mike Michell) Date: Sun, 26 Feb 2023 18:30:41 -0600 Subject: [petsc-users] DMPlex Halo Communication or Graph Partitioner Issue In-Reply-To: References: Message-ID: I will. Thank you very much! > On Sun, Feb 26, 2023 at 6:56?PM Mike Michell > wrote: > >> Okay that was the part of the code incompatible with the latest petsc. >> Thank you for support. >> I also need to call DMPlexLabelComplete() for the vertices on the >> parallel and physical boundaries, but I get an error >> if DMPlexLabelComplete() is called before DMSetPointSF(dm, sf). >> >> So I do: >> { >> PetscSF sf; >> >> PetscCall(DMGetPointSF(dm, &sf)); >> PetscCall(DMSetPointSF(dm, NULL)); >> PetscCall(DMPlexMarkBoundaryFaces(dm, val, label)); >> PetscCall(DMSetPointSF(dm, sf)); >> PetscCall(DMPlexLabelComplete(dm, label)); >> } >> >> I believe that this flow is okay since label is already marked on >> parallel boundary. But could you please confirm that calling >> DMPlexLabelComplete() after DMSetPointSF(dm, sf) does not create problems >> to mark vertices on parallel boundary? >> > > Yes. The idea is the following. A DMPlex is a collection of serial meshes. > When an SF is specified, this identifies some mesh points > on one process with those on another. There are no limits to the > identification, so you can for instance have overlap. > > >> The solution I get from the code with the above flow is now correct and >> has no errors with latest petsc, but I want to double check. >> > > Excellent. I think it might be possible to do what you want in less code. > If sometime this is something that you want to pursue, > please send me an overview mail about the calculations you are doing. > > Thanks, > > Matt > > >> Thanks, >> Mike >> >> >>> On Sun, Feb 26, 2023 at 2:07?PM Mike Michell >>> wrote: >>> >>>> I cannot agree with this argument, unless you also tested with petsc >>>> 3.18.4 tarball from https://petsc.org/release/install/download/. >>>> If library has issue, it is trivial that I will see an error from my >>>> code. >>>> >>>> I ran my code with valgrind and see no error if it is with petsc >>>> 3.18.4. You can test with my code with valgrind or address sanitizer with >>>> this version of petsc-3.18.4.tar.gz from ( >>>> https://petsc.org/release/install/download/). I expect you see no >>>> error. >>>> >>>> >>>> Let me ask my question differently: >>>> Has any change been made on DMPlexMarkBoundaryFaces() recently? I found >>>> that the latest petsc does not recognize parallel (but not physical) >>>> boundary as boundary for distributed dm (line 235 of my example code). >>>> Because of this, you saw the error from the arrays: >>>> >>> >>> The behavior of DMPlexMarkBoundaryFaces() was changed 3 months ago: >>> >>> >>> https://gitlab.com/petsc/petsc/-/commit/429fa399fc3cd6fd42f3ca9697415d505b9dce5d >>> >>> I did update the documentation for that function >>> >>> Note: >>> This function will use the point `PetscSF` from the input `DM` to >>> exclude points on the partition boundary from being marked, unless the >>> partition overlap is greater than zero. If you also wish to mark the >>> partition boundary, you can use `DMSetPointSF()` to temporarily set it to >>> NULL, and then reset it to the original object after the call. >>> >>> The reason is that if you call it in parallel, it is no longer suitable >>> for applying boundary conditions. If you want to restore the prior behavior, >>> you can use: >>> >>> { >>> PetscSF sf; >>> >>> PetscCall(DMGetPointSF(dm, &sf)); >>> PetscCall(DMSetPointSF(dm, NULL)); >>> PetscCall(DMPlexMarkBoundaryFaces(dm, val, label)); >>> PetscCall(DMSetPointSF(dm, sf)); >>> } >>> >>> Thanks, >>> >>> Matt >>> >>> ! midpoint of median-dual face for inner face >>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>> >>>> and these were allocated here >>>> >>>> allocate(xc(ncell)) >>>> allocate(yc(ncell)) >>>> >>>> as you pointed out. Or any change made on distribution of dm over procs? >>>> >>>> Thanks, >>>> Mike >>>> >>>> >>>>> On Sun, Feb 26, 2023 at 11:32?AM Mike Michell >>>>> wrote: >>>>> >>>>>> This is what I get from petsc main which is not correct: >>>>>> Overall volume computed from median-dual ... >>>>>> 6.37050098781844 >>>>>> Overall volume computed from PETSc ... >>>>>> 3.15470053800000 >>>>>> >>>>>> >>>>>> This is what I get from petsc 3.18.4 which is correct: >>>>>> Overall volume computed from median-dual ... >>>>>> 3.15470053800000 >>>>>> Overall volume computed from PETSc ... >>>>>> 3.15470053800000 >>>>>> >>>>>> >>>>>> If there is a problem in the code, it is also strange for me that >>>>>> petsc 3.18.4 gives the correct answer >>>>>> >>>>> >>>>> As I said, this can happen due to different layouts in memory. If you >>>>> run it under valgrind, or address sanitizer, you will see >>>>> that there is a problem. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks, >>>>>> Mike >>>>>> >>>>>> >>>>>>> On Sun, Feb 26, 2023 at 11:19?AM Mike Michell >>>>>>> wrote: >>>>>>> >>>>>>>> Which version of petsc you tested? With petsc 3.18.4, median duan >>>>>>>> volume gives the same value with petsc from DMPlexComputeCellGeometryFVM(). >>>>>>>> >>>>>>> >>>>>>> This is only an accident of the data layout. The code you sent >>>>>>> writes over memory in the local Fortran arrays. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> On Sat, Feb 25, 2023 at 3:11 PM Mike Michell < >>>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> My apologies for the late follow-up. There was a time conflict. >>>>>>>>>> >>>>>>>>>> A simple example code related to the issue I mentioned is >>>>>>>>>> attached here. The sample code does: (1) load grid on dm, (2) compute >>>>>>>>>> vertex-wise control volume for each node in a median-dual way, (3) halo >>>>>>>>>> exchange among procs to have complete control volume values, and (4) print >>>>>>>>>> out its field as a .vtu file. To make sure, the computed control volume is >>>>>>>>>> also compared with PETSc-computed control volume via >>>>>>>>>> DMPlexComputeCellGeometryFVM() (see lines 771-793). >>>>>>>>>> >>>>>>>>>> Back to the original problem, I can get a proper control volume >>>>>>>>>> field with PETSc 3.18.4, which is the latest stable release. However, if I >>>>>>>>>> use PETSc from the main repo, it gives a strange control volume field. >>>>>>>>>> Something is certainly strange around the parallel boundaries, thus I think >>>>>>>>>> something went wrong with halo communication. To help understand, a >>>>>>>>>> comparing snapshot is also attached. I guess a certain part of my code is >>>>>>>>>> no longer compatible with PETSc unless there is a bug in the library. Could >>>>>>>>>> I get comments on it? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I can run your example. The numbers I get for "median-dual volume" >>>>>>>>> do not match the "PETSc volume", and the PETSc volume is correct. Moreover, >>>>>>>>> the median-dual numbers change, which suggests a memory fault. I compiled >>>>>>>>> it using address sanitizer, and it found an error: >>>>>>>>> >>>>>>>>> Number of physical boundary edge ... 4 0 >>>>>>>>> Number of physical and parallel boundary edge ... 4 >>>>>>>>> 0 >>>>>>>>> Number of parallel boundary edge ... 0 0 >>>>>>>>> Number of physical boundary edge ... 4 1 >>>>>>>>> Number of physical and parallel boundary edge ... 4 >>>>>>>>> 1 >>>>>>>>> Number of parallel boundary edge ... 0 1 >>>>>>>>> ================================================================= >>>>>>>>> ==36587==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>>>>>> 0x603000022d40 at pc 0x0001068e12a8 bp 0x7ffee932cfd0 sp 0x7ffee932cfc8 >>>>>>>>> READ of size 8 at 0x603000022d40 thread T0 >>>>>>>>> ================================================================= >>>>>>>>> ==36588==ERROR: AddressSanitizer: heap-buffer-overflow on address >>>>>>>>> 0x60300000f0f0 at pc 0x00010cf702a8 bp 0x7ffee2c9dfd0 sp 0x7ffee2c9dfc8 >>>>>>>>> READ of size 8 at 0x60300000f0f0 thread T0 >>>>>>>>> #0 0x10cf702a7 in MAIN__ test.F90:657 >>>>>>>>> #1 0x10cf768ee in main test.F90:43 >>>>>>>>> #0 0x1068e12a7 in MAIN__ test.F90:657 >>>>>>>>> #1 0x1068e78ee in main test.F90:43 >>>>>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>>>> >>>>>>>>> 0x60300000f0f0 is located 0 bytes to the right of 32-byte region >>>>>>>>> [0x60300000f0d0,0x60300000f0f0) >>>>>>>>> allocated by thread T0 here: >>>>>>>>> #2 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>>>> >>>>>>>>> 0x603000022d40 is located 0 bytes to the right of 32-byte region >>>>>>>>> [0x603000022d20,0x603000022d40) >>>>>>>>> allocated by thread T0 here: >>>>>>>>> #0 0x114a7457f in wrap_malloc (libasan.5.dylib:x86_64+0x7b57f) >>>>>>>>> #1 0x1068dba71 in MAIN__ test.F90:499 >>>>>>>>> #2 0x1068e78ee in main test.F90:43 >>>>>>>>> #3 0x7fff6b80acc8 in start (libdyld.dylib:x86_64+0x1acc8) >>>>>>>>> >>>>>>>>> SUMMARY: AddressSanitizer: heap-buffer-overflow test.F90:657 in >>>>>>>>> MAIN__ >>>>>>>>> Shadow bytes around the buggy address: >>>>>>>>> >>>>>>>>> which corresponds to >>>>>>>>> >>>>>>>>> ! midpoint of median-dual face for inner face >>>>>>>>> axrf(ifa,1) = 0.5d0*(yc(nc1)+yfc(ifa)) ! for nc1 cell >>>>>>>>> axrf(ifa,2) = 0.5d0*(yc(nc2)+yfc(ifa)) ! for nc2 cell >>>>>>>>> >>>>>>>>> and these were allocated here >>>>>>>>> >>>>>>>>> allocate(xc(ncell)) >>>>>>>>> allocate(yc(ncell)) >>>>>>>>> >>>>>>>>> Hopefully the error is straightforward to see now. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mike >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Mon, Feb 20, 2023 at 12:05 PM Matthew Knepley < >>>>>>>>>>> knepley at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Sat, Feb 18, 2023 at 12:00 PM Mike Michell < >>>>>>>>>>>> mi.mike1021 at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> As a follow-up, I tested: >>>>>>>>>>>>> >>>>>>>>>>>>> (1) Download tar for v3.18.4 from petsc gitlab ( >>>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/tree/v3.18.4) has no issue >>>>>>>>>>>>> on DMPlex halo exchange. This version works as I expect. >>>>>>>>>>>>> (2) Clone main branch (git clone >>>>>>>>>>>>> https://gitlab.com/petsc/petsc.git) has issues with DMPlex >>>>>>>>>>>>> halo exchange. Something is suspicious about this main branch, related to >>>>>>>>>>>>> DMPlex halo. The solution field I got is not correct. But it works okay >>>>>>>>>>>>> with 1-proc. >>>>>>>>>>>>> >>>>>>>>>>>>> Does anyone have any comments on this issue? I am curious if >>>>>>>>>>>>> other DMPlex users have no problem regarding halo exchange. FYI, I do not >>>>>>>>>>>>> declare ghost layers for halo exchange. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> There should not have been any changes there and there are >>>>>>>>>>>> definitely tests for this. >>>>>>>>>>>> >>>>>>>>>>>> It would be great if you could send something that failed. I >>>>>>>>>>>> could fix it and add it as a test. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Just to follow up, we have tests of the low-level communication >>>>>>>>>>> (Plex tests ex1, ex12, ex18, ex29, ex31), and then we have >>>>>>>>>>> tests that use halo exchange for PDE calculations, for example >>>>>>>>>>> SNES tutorial ex12, ex13, ex62. THe convergence rates >>>>>>>>>>> should be off if the halo exchange were wrong. Is there any >>>>>>>>>>> example similar to your code that is failing on your installation? >>>>>>>>>>> Or is there a way to run your code? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Matt >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Mike >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am using PETSc for Fortran with DMPlex. I have been using >>>>>>>>>>>>>> this version of PETSc: >>>>>>>>>>>>>> >>git rev-parse origin >>>>>>>>>>>>>> >>995ec06f924a86c4d28df68d1fdd6572768b0de1 >>>>>>>>>>>>>> >>git rev-parse FETCH_HEAD >>>>>>>>>>>>>> >>9a04a86bf40bf893fb82f466a1bc8943d9bc2a6b >>>>>>>>>>>>>> >>>>>>>>>>>>>> There has been no issue, before the one with VTK viewer, >>>>>>>>>>>>>> which Jed fixed today ( >>>>>>>>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735 >>>>>>>>>>>>>> ). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Since that MR has been merged into the main repo, I pulled >>>>>>>>>>>>>> the latest version of PETSc (basically I cloned it from scratch). But if I >>>>>>>>>>>>>> use the same configure option with before, and run my code, then there is >>>>>>>>>>>>>> an issue with halo exchange. The code runs without error message, but it >>>>>>>>>>>>>> gives wrong solution field. I guess the issue I have is related to graph >>>>>>>>>>>>>> partitioner or halo exchange part. This is because if I run the code with >>>>>>>>>>>>>> 1-proc, the solution is correct. I only updated the version of PETSc and >>>>>>>>>>>>>> there was no change in my own code. Could I get any comments on the issue? >>>>>>>>>>>>>> I was wondering if there have been many changes in halo exchange or graph >>>>>>>>>>>>>> partitioning & distributing part related to DMPlex. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Mike >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>> their experiments is infinitely more interesting than any results to which >>>>>>>>>>>> their experiments lead. >>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>> >>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>>>> experiments lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Feb 27 01:46:24 2023 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 27 Feb 2023 08:46:24 +0100 Subject: [petsc-users] Frontier In-Reply-To: References: Message-ID: Treb, putting this on the list. Treb has ECP early access to Frontier and has some problems: ** first he has error from hypre: [0]PETSC ERROR: #1 VecGetArrayForHYPRE() at /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/vec/vec/impls/hypre/vhyp.c:95 We had another stack trace that I can not find that came from a Vec routine, copy to the device as I recall. ** The hypre folks could not do much with that so I suggested using aijhipsparse and he got this error message. Looks like just a segv in MatAssemblyEnd_SeqAIJ. Treb, 1) this error might be reproducible on one processor. Could you try to scale this problem down. 2) I assume this was built with debugging=1 3) if you can get it to fail on one process then you might be able to get a good stack trace with a line number with a debugger. GDB is available (on Crusher) but you need to do a few things. 4) You might see if you can get some AMD help. Thanks, Mark On Mon, Feb 27, 2023 at 4:46?AM David Trebotich wrote: > Hey Mark- > This is a new issue that doesn't seem to be hypre. It's not using the > -mat_type aijhipsparse in this run. Can you interpret these petsc errors? > Seems like it's just crashing. Wasn't doing this last night. I was using > less nodes however. > [39872]PETSC ERROR: > ------------------------------------------------------------------------ > [39872]PETSC ERROR: Caught signal number 15 Terminate: Some process (or > the batch system) has told this process to end > [38267] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > -is_view ascii[:[filename][:[format][:append]]]: Prints object to stdout > or ASCII file (PetscOptionsGetViewer) > [42800]PETSC ERROR: > ------------------------------------------------------------------------ > [38266] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 32768) < 0.6. Do not use CompressedRow routines. > [17088]PETSC ERROR: > ------------------------------------------------------------------------ > [17088]PETSC ERROR: Caught signal number 15 Terminate: Some process (or > the batch system) has told this process to end > ---------------------------------------- > Viewer (-is_view) options: > [41728]PETSC ERROR: > ------------------------------------------------------------------------ > [38267] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 32768) < 0.6. Do not use CompressedRow routines. > [10256]PETSC ERROR: > ------------------------------------------------------------------------ > [10256]PETSC ERROR: Caught signal number 15 Terminate: Some process (or > the batch system) has told this process to end > -is_view draw[:[drawtype][:filename|format]] Draws object > (PetscOptionsGetViewer) > [56]PETSC ERROR: > ------------------------------------------------------------------------ > [38270] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 32768) < 0.6. Do not use CompressedRow routines. > [41128]PETSC ERROR: > ------------------------------------------------------------------------ > -is_view binary[:[filename][:[format][:append]]]: Saves object to a > binary file (PetscOptionsGetViewer) > [42496]PETSC ERROR: > ------------------------------------------------------------------------ > [42496]PETSC ERROR: Caught signal number 15 Terminate: Some process (or > the batch system) has told this process to end > [38265] MatAssemblyEnd_SeqAIJ(): Matrix size: 32768 X 32768; storage > space: 0 unneeded,378944 used > [10256]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > -is_view ascii[:[filename][:[format][:append]]]: Prints object to stdout > or ASCII file (PetscOptionsGetViewer) > [24]PETSC ERROR: > ------------------------------------------------------------------------ > [24]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [38268] MatAssemblyEnd_SeqAIJ(): Matrix size: 32768 X 32768; storage > space: 0 unneeded,378944 used > [4128]PETSC ERROR: > ------------------------------------------------------------------------ > -is_view socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > [60]PETSC ERROR: > ------------------------------------------------------------------------ > [60]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [38269] MatAssemblyEnd_SeqAIJ(): Matrix size: 32768 X 32768; storage > space: 0 unneeded,378944 used > [10260]PETSC ERROR: > ------------------------------------------------------------------------ > -is_view draw[:[drawtype][:filename|format]] Draws object > (PetscOptionsGetViewer) > [28]PETSC ERROR: > ------------------------------------------------------------------------ > [28]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [38265] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [4132]PETSC ERROR: > ------------------------------------------------------------------------ > -is_view binary[:[filename][:[format][:append]]]: Saves object to a > binary file (PetscOptionsGetViewer) > [56]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [38268] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > MPICH ERROR [Rank 10260] [job id 1277040.1] [Sun Feb 26 22:32:21 2023] > [frontier01491] - Abort(59) (rank 10260 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process\ > 10260 > > aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 10260 > -is_view draw[:[drawtype][:filename|format]] Draws object > (PetscOptionsGetViewer) > [28]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [38268] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 196 > [4132]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > -is_view saws[:communicatorname]: Publishes object to SAWs > (PetscOptionsGetViewer) > [60]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [38269] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 196 > [24]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > -is_view socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > [4128]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [38271] MatSeqAIJCheckInode(): Found 32768 nodes out of 32768 rows. > Not using Inode routines > [56]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > -is_view saws[:communicatorname]: Publishes object to SAWs > (PetscOptionsGetViewer) > [13736]PETSC ERROR: > ------------------------------------------------------------------------ > [13736]PETSC ERROR: Caught signal number 15 Terminate: Some process (or > the batch system) has told this process to end > [38265] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 32768) < 0.6. Do not use CompressedRow routines. > [28]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [6801] PetscCommDuplicate(): Using internal PETSc communicator > 1140850689 -2080374781 > [4132]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [38268] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 32768) < 0.6. Do not use CompressedRow routines. > [60]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [6800] PetscCommDuplicate(): Using internal PETSc communicator > 1140850689 -2080374781 > [13740]PETSC ERROR: > ------------------------------------------------------------------------ > [13740]PETSC ERROR: Caught signal number 15 Terminate: Some process (or > the batch system) has told this process to end > [38264] MatAssemblyEnd_SeqAIJ(): Matrix size: 32768 X 32768; storage > space: 0 unneeded,378944 used > [24]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [6804] PetscCommDuplicate(): Using internal PETSc communicator > 1140850689 -2080374781 > [4128]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [38266] MatSeqAIJCheckInode(): Found 32768 nodes out of 32768 rows. > Not using Inode routines > [56]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [6805] PetscCommDuplicate(): Using internal PETSc communicator > 1140850689 -2080374781 > [13736]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [38269] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 32768) < 0.6. Do not use CompressedRow routines. > [28]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [6802] MatAssemblyEnd_SeqAIJ(): Matrix size: 32768 X 0; storage > space: 0 unneeded,0 used > [4132]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [38264] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [60]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [6806] MatAssemblyEnd_SeqAIJ(): Matrix size: 32768 X 0; storage > space: 0 unneeded,0 used > [13740]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [38267] MatSeqAIJCheckInode(): Found 32768 nodes out of 32768 rows. > Not using Inode routines > [24]PETSC ERROR: to get more information on the crash. > [6802] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [4128]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [38270] MatSeqAIJCheckInode(): Found 32768 nodes out of 32768 rows. > Not using Inode routines > [56]PETSC ERROR: to get more information on the crash. > [6803] MatAssemblyEnd_SeqAIJ(): Matrix size: 32768 X 0; storage > space: 0 unneeded,0 used > [13736]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [13736]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [38264] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 196 > [28]PETSC ERROR: to get more information on the crash. > [6806] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [4128]PETSC ERROR: to get more information on the crash. > [38264] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 32768) < 0.6. Do not use CompressedRow routines. > [60]PETSC ERROR: to get more information on the crash. > [6802] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [13740]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [38265] MatSeqAIJCheckInode(): Found 32768 nodes out of 32768 rows. > Not using Inode routines > MPICH ERROR [Rank 24] [job id 1277040.1] [Sun Feb 26 22:32:21 2023] > [frontier00021] - Abort(59) (rank 24 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 24 > > [6807] MatAssemblyEnd_SeqAIJ(): Matrix size: 32768 X 0; storage > space: 0 unneeded,0 used > [4132]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [38268] MatSeqAIJCheckInode(): Found 32768 nodes out of 32768 rows. > Not using Inode routines > MPICH ERROR [Rank 56] [job id 1277040.1] [Sun Feb 26 22:32:21 2023] > [frontier00025] - Abort(59) (rank 56 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 56 > > [6803] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [13736]PETSC ERROR: to get more information on the crash. > [38269] MatSeqAIJCheckInode(): Found 32768 nodes out of 32768 rows. > Not using Inode routines > MPICH ERROR [Rank 28] [job id 1277040.1] [Sun Feb 26 22:32:21 2023] > [frontier00021] - Abort(59) (rank 28 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 28 > > [6806] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [4132]PETSC ERROR: to get more information on the crash. > [38264] MatSeqAIJCheckInode(): Found 32768 nodes out of 32768 rows. > Not using Inode routines > MPICH ERROR [Rank 60] [job id 1277040.1] [Sun Feb 26 22:32:21 2023] > [frontier00025] - Abort(59) (rank 60 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 60 > > aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 60 > [6807] MatAssemblyEnd_SeqAIJ(): Number of mallocs during > MatSetValues() is 0 > [13740]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [38266] PetscCommDuplicate(): Using internal PETSc communicator > 1140850689 -2080374782 > aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 24 > [6802] MatCheckCompressedRow(): Found the ratio (num_zerorows > 32768)/(num_localrows 32768) > 0.6. Use CompressedRow routines. > MPICH ERROR [Rank 4128] [job id 1277040.1] [Sun Feb 26 22:32:21 2023] > [frontier00609] - Abort(59) (rank 4128 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process 4\ > 128 > [38271] PetscCommDuplicate(): Using internal PETSc communicator > 1140850689 -2080374781 > aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 56 > [6803] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > MPICH ERROR [Rank 13736] [job id 1277040.1] [Sun Feb 26 22:32:21 2023] > [frontier01996] - Abort(59) (rank 13736 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 59) - process\ > 13736 > > [38267] PetscCommDuplicate(): Using internal PETSc communicator > 1140850689 -2080374782 > aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 28 > [6806] MatCheckCompressedRow(): Found the ratio (num_zerorows > 32768)/(num_localrows 32768) > 0.6. Use CompressedRow routines. > > [38270] PetscCommDuplicate(): Using internal PETSc communicator > 1140850689 -2080374781 > [13740]PETSC ERROR: to get more information on the crash. > [6807] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [4160]PETSC ERROR: > ------------------------------------------------------------------------ > [4160]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > > On Sat, Feb 25, 2023 at 11:01 AM Mark Adams wrote: > >> There is something here. It looks like an error from hypre, but you do >> have some sort of stack trace. >> PETSc is catching an error here: >> >> [0]PETSC ERROR: #1 VecGetArrayForHYPRE() at >> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/vec/vec/impls/hypre/vhyp.c:95 >> >> You might send this whole output to PETSc and see if someone can help. >> >> Mark >> >> >> On Sat, Feb 25, 2023 at 12:37 PM David Trebotich >> wrote: >> >>> from the 8192 node run: >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [32016] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374780 >>> [30655] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: Invalid argument >>> [32017] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374778 >>> [30653] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to >>> enable PETSc device support, for example, in some cases, -vec_type cuda >>> [32018] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [30649] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374778 >>> [0]PETSC ERROR: WARNING! There are option(s) set that were not used! >>> Could be the program crashed before they were used or a spelling mistake, >>> etc! >>> [32021] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [30654] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: Option left: name:-diff_ksp_converged_reason (no value) >>> [32019] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [30652] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: Option left: name:-diff_ksp_max_it value: 50 >>> [32023] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [30650] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: Option left: name:-diff_ksp_norm_type value: >>> unpreconditioned >>> [32020] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [30648] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374780 >>> [0]PETSC ERROR: Option left: name:-diff_ksp_rtol value: 1.e-6 >>> [32022] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [32021] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: Option left: name:-diff_ksp_type value: gmres >>> [609] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374778 >>> [32017] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374778 >>> [0]PETSC ERROR: Option left: name:-diff_pc_type value: jacobi >>> [611] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [32018] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: Option left: name:-options_left (no value) >>> [615] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [32016] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374780 >>> [0]PETSC ERROR: Option left: name:-proj-mac_mat_type value: aijhipsparse >>> [608] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374780 >>> [610] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> then further down >>> [10191] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [1978] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: #1 VecGetArrayForHYPRE() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/vec/vec/impls/hypre/vhyp.c:95 >>> [10184] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374780 >>> [1977] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374778 >>> [0]PETSC ERROR: #2 VecHYPRE_IJVectorPushVecRead() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/vec/vec/impls/hypre/vhyp.c:138 >>> [10185] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374778 >>> [10186] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: #3 PCApply_HYPRE() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/ksp/pc/impls/hypre/hypre.c:433 >>> [6081] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374778 >>> [10188] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: #4 PCApply() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/ksp/pc/interface/precon.c:441 >>> [6083] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [10189] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: #5 PCApplyBAorAB() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/ksp/pc/interface/precon.c:711 >>> [6085] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [10190] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: #6 KSP_PCApplyBAorAB() at >>> /gpfs/alpine/world-shared/geo127/petsc_treb/petsc/include/petsc/private/kspimpl.h:416 >>> [6086] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [10191] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: #7 KSPGMRESCycle() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/ksp/ksp/impls/gmres/gmres.c:147 >>> [6087] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [10187] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [0]PETSC ERROR: #8 KSPSolve_GMRES() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/ksp/ksp/impls/gmres/gmres.c:228 >>> [6080] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374780 >>> [10184] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374780 >>> [0]PETSC ERROR: #9 KSPSolve_Private() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/ksp/ksp/interface/itfunc.c:899 >>> [6082] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [10185] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374778 >>> [0]PETSC ERROR: #10 KSPSolve() at >>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/ksp/ksp/interface/itfunc.c:1071 >>> [6084] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> [10189] PetscCommDuplicate(): Using internal PETSc communicator >>> 1140850689 -2080374783 >>> >>> >>> On Fri, Feb 24, 2023 at 3:57 PM David Trebotich >>> wrote: >>> >>>> good idea. the global one is unused for small problems. waiting for >>>> large job to run to see if this fixes that problem. >>>> >>>> On Fri, Feb 24, 2023 at 11:04 AM Mark Adams wrote: >>>> >>>>> I think you added the prefixes like a year ago, so the prefixes should >>>>> work. >>>>> Try both and see which one is used with -options_left >>>>> >>>>> On Fri, Feb 24, 2023 at 1:14 PM David Trebotich >>>>> wrote: >>>>> >>>>>> I am using 3.18.4. >>>>>> >>>>>> Is aijhipsparse the global -mat_type or should this be the prefixed >>>>>> one for the solve where I was getting the problem, i.e., -proj_mac_mat_type >>>>>> aijhipsparse >>>>>> >>>>>> On Fri, Feb 24, 2023 at 9:28 AM Mark Adams wrote: >>>>>> >>>>>>> Oh, its 'aijhipsparse' >>>>>>> And you def want v3.18.4 >>>>>>> >>>>>>> On Fri, Feb 24, 2023 at 11:29 AM David Trebotich < >>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>> >>>>>>>> I rana small problem with >>>>>>>> -proj_mac_mat_type hipsparse >>>>>>>> and get >>>>>>>> [10]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [10]PETSC ERROR: Unknown type. Check for miss-spelling or missing >>>>>>>> package: >>>>>>>> https://petsc.org/release/install/install/#external-packages >>>>>>>> [10]PETSC ERROR: Unknown Mat type given: hipsparse >>>>>>>> >>>>>>>> On Fri, Feb 24, 2023 at 4:09 AM Mark Adams wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Feb 24, 2023 at 3:35 AM David Trebotich < >>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>> >>>>>>>>>> More info from the stack. This is a full machine run on Frontier >>>>>>>>>> and I get this before I get into the first solve. It may or may not be same >>>>>>>>>> error as before but hopefully there's more here for you to debug. >>>>>>>>>> [1540]PETSC ERROR: >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> [1540]PETSC ERROR: Caught signal number 11 SEGV: Segmentation >>>>>>>>>> Violation, probably memory access out of range >>>>>>>>>> [1540]PETSC ERROR: Try option -start_in_debugger or >>>>>>>>>> -on_error_attach_debugger >>>>>>>>>> [1540]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind >>>>>>>>>> and https://petsc.org/release/faq/ >>>>>>>>>> [1540]PETSC ERROR: --------------------- Stack Frames >>>>>>>>>> ------------------------------------ >>>>>>>>>> [1540]PETSC ERROR: The line numbers in the error traceback are >>>>>>>>>> not always exact. >>>>>>>>>> [1540]PETSC ERROR: #1 hypre_ParCSRMatrixMigrate() >>>>>>>>>> [1540]PETSC ERROR: #2 MatBindToCPU_HYPRE() at >>>>>>>>>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/mat/impls/hypre/mhypre.c:1255 >>>>>>>>>> >>>>>>>>> >>>>>>>>> This looks like the copy to device call: >>>>>>>>> >>>>>>>>> src/mat/impls/hypre/mhypre.c:1260: >>>>>>>>> PetscCallExternal(hypre_ParCSRMatrixMigrate, parcsr, hmem); >>>>>>>>> >>>>>>>>> This makes sense. You assemble it in the host and it gets sent to >>>>>>>>> the device. >>>>>>>>> >>>>>>>>> I assume you are using -mat_type hypre. >>>>>>>>> To get moving you could try -mat_type hipsparse >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [1540]PETSC ERROR: #3 MatAssemblyEnd_HYPRE() at >>>>>>>>>> /gpfs/alpine/geo127/world-shared/petsc_treb/petsc/src/mat/impls/hypre/mhypre.c:1332 >>>>>>>>>> >>>>>>>>>> On Wed, Feb 22, 2023 at 9:26 AM Li, Rui Peng >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> I am not sure how much information I get here for this segfault. >>>>>>>>>>> All I can see is you wanted to migrate (copy) a matrix (on device?) to >>>>>>>>>>> host, and it failed at somewhere in the function. The function itself looks >>>>>>>>>>> simple and fine to me. We may need to check if everything is sane prior to >>>>>>>>>>> the point. I am happy to help further. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> -Rui Peng >>>>>>>>>>> >>>>>>>>>>> ________________________________________ >>>>>>>>>>> From: David Trebotich >>>>>>>>>>> Sent: Wednesday, February 22, 2023 9:17 AM >>>>>>>>>>> To: Yang, Ulrike Meier >>>>>>>>>>> Cc: Li, Rui Peng; MFAdams at LBL.GOV >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> Hi Ulrike, Rui Peng- >>>>>>>>>>> >>>>>>>>>>> I am running into a hypre problem on Frontier. I already passed >>>>>>>>>>> it by Mark and here is what we get out of the stack: >>>>>>>>>>> [1704]PETSC ERROR: >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> [1704]PETSC ERROR: Caught signal number 11 SEGV: Segmentation >>>>>>>>>>> Violation, probably memory access out of range >>>>>>>>>>> [1704]PETSC ERROR: Try option -start_in_debugger or >>>>>>>>>>> -on_error_attach_debugger >>>>>>>>>>> [1704]PETSC ERROR: or see >>>>>>>>>>> https://petsc.org/release/faq/#valgrind< >>>>>>>>>>> https://urldefense.us/v3/__https://petsc.org/release/faq/*valgrind__;Iw!!G2kpM7uM-TzIFchu!kUAHuocSRof5_aTlZtjYLNna1q86tr06UuRvUcmqBdCqWkovEx-X9Y-Md5I8Mcw$> >>>>>>>>>>> and https://petsc.org/release/faq/< >>>>>>>>>>> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G2kpM7uM-TzIFchu!kUAHuocSRof5_aTlZtjYLNna1q86tr06UuRvUcmqBdCqWkovEx-X9Y-Mm8jIngI$ >>>>>>>>>>> > >>>>>>>>>>> [1704]PETSC ERROR: --------------------- Stack Frames >>>>>>>>>>> ------------------------------------ >>>>>>>>>>> [1704]PETSC ERROR: The line numbers in the error traceback are >>>>>>>>>>> not always exact. >>>>>>>>>>> [1704]PETSC ERROR: #1 hypre_ParCSRMatrixMigrate() >>>>>>>>>>> >>>>>>>>>>> and then Mark got this: >>>>>>>>>>> (new_py-env) 07:24 1 adams/landau-ex1-fix= ~/Codes/petsc2$ git >>>>>>>>>>> grep hypre_ParCSRMatrixMigrate >>>>>>>>>>> src/mat/impls/hypre/mhypre.c: >>>>>>>>>>> PetscCallExternal(hypre_ParCSRMatrixMigrate, parcsr, hmem); >>>>>>>>>>> src/mat/impls/hypre/mhypre.c: >>>>>>>>>>> PetscCallExternal(hypre_ParCSRMatrixMigrate,parcsr, HYPRE_MEMORY_HOST); >>>>>>>>>>> >>>>>>>>>>> Any help debugging this would be appreciated. Thanks. ANd let me >>>>>>>>>>> know if you need to be added to my Frontier project for access. I am on >>>>>>>>>>> through this Friday. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Feb 11, 2023 at 10:35 AM David Trebotich < >>>>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>>>> I got on Frontier yesterday. Here's how it went. Had to use >>>>>>>>>>> PrgEnv-cray to build petsc-hypre. PrgEnv-amd was having some problems. Also >>>>>>>>>>> their default rocm/5.3.0 was problematic so backed off to rocm/ >>>>>>>>>>> 5.2.0.< >>>>>>>>>>> https://urldefense.us/v3/__http://5.2.0.__;!!G2kpM7uM-TzIFchu!kUAHuocSRof5_aTlZtjYLNna1q86tr06UuRvUcmqBdCqWkovEx-X9Y-M2j0U4y8$> >>>>>>>>>>> They did make 5.4.0 available yesterday but I stuck with 5.2.0. I got >>>>>>>>>>> everything built and working. Scaling is excellent thus far. Performance is >>>>>>>>>>> a little bit better than Crusher. And I am taking the scaling test up to >>>>>>>>>>> higher concurrencies. Here's the comparison to Crusher. Same scaling test >>>>>>>>>>> that we have been previously discussing. >>>>>>>>>>> [image.png] >>>>>>>>>>> >>>>>>>>>>> On Fri, Feb 10, 2023 at 9:17 AM Yang, Ulrike Meier < >>>>>>>>>>> yang11 at llnl.gov> wrote: >>>>>>>>>>> I haven?t seen this before. Is this from PETSc? >>>>>>>>>>> >>>>>>>>>>> From: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>> >>>>>>>>>>> Sent: Friday, February 10, 2023 09:14 AM >>>>>>>>>>> To: Yang, Ulrike Meier > >>>>>>>>>>> Cc: Li, Rui Peng >; >>>>>>>>>>> MFAdams at LBL.GOV >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> I am on Frontier today for 10 days. Building petsc-hypre. I do >>>>>>>>>>> get this warning. ANything I should worry about? >>>>>>>>>>> >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> ***** WARNING ***** >>>>>>>>>>> Branch "master" is specified, however remote branch >>>>>>>>>>> "origin/master" also exists! >>>>>>>>>>> Proceeding with using the remote branch. To use the local >>>>>>>>>>> branch (manually checkout local >>>>>>>>>>> branch and) - rerun configure with option >>>>>>>>>>> --download-hypre-commit=HEAD) >>>>>>>>>>> >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 7, 2023 at 7:09 PM David Trebotich < >>>>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>>>> I should also say that the timestep includes other solves as >>>>>>>>>>> well like advection and Helmholtz but the latter is not hyper, rather petsc >>>>>>>>>>> Jacobi. >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 7, 2023, 5:44 PM Yang, Ulrike Meier >>>>>>>>>> > wrote: >>>>>>>>>>> Great. Thanks for the new figure and explanation >>>>>>>>>>> Ulrike >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Get Outlook for iOS< >>>>>>>>>>> https://urldefense.us/v3/__https:/aka.ms/o0ukef__;!!G2kpM7uM-TzIFchu!iyppRHJeX4wNsqIy0mSCIAWwwmzeqaLTfY5V1Q98MhNlzvJp0jgUuPeYYGehcqSN$ >>>>>>>>>>> > >>>>>>>>>>> ________________________________ >>>>>>>>>>> From: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>> >>>>>>>>>>> Sent: Tuesday, February 7, 2023 4:39:01 PM >>>>>>>>>>> To: Yang, Ulrike Meier > >>>>>>>>>>> Cc: Li, Rui Peng >; >>>>>>>>>>> MFAdams at LBL.GOV >>>>>>>>>> MFAdams at LBL.GOV>> >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> Hi Ulrike- >>>>>>>>>>> In this scaling problem I use hypre to solve the >>>>>>>>>>> pressure-Poisson problem in my projection method for incompressible >>>>>>>>>>> Navier-Stokes. The preconditioner is set up once and re-used. This >>>>>>>>>>> particular scaling problem is not time-dependent, that is, the grid is not >>>>>>>>>>> moving so I don't have to redefine solver stencils, etc. I run 10 timesteps >>>>>>>>>>> of this and average the time. >>>>>>>>>>> >>>>>>>>>>> When I did the July runs I thought it was anomalous data because >>>>>>>>>>> it was slower. But I have seen this before where something may have been >>>>>>>>>>> updated the previous 6 months and caused an uptick in performance. This >>>>>>>>>>> anomaly was one of the reasons why I ran this recent test again besides >>>>>>>>>>> making sure the new hypre release is performing the same. So, let's just >>>>>>>>>>> forget the July data. Here is Feb 2022 vs. Jan 2023, with either boxes or >>>>>>>>>>> nodes on x axis: >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 7, 2023 at 3:18 PM Yang, Ulrike Meier < >>>>>>>>>>> yang11 at llnl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> I am still trying to understand the figures and your use of >>>>>>>>>>> hypre: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> When you are using hypre, do you just solve one system? Or is >>>>>>>>>>> this a time dependent problem where you need to solve systems many times? >>>>>>>>>>> >>>>>>>>>>> If the latter do you set up the preconditioner once and reuse >>>>>>>>>>> it, or do you set up every time? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Now I have some questions about the figures: >>>>>>>>>>> >>>>>>>>>>> It seems in this plot you are using 256 boxes per node and get >>>>>>>>>>> better performance with hypre in July 2022 than in February 2022. Is this >>>>>>>>>>> correct? >>>>>>>>>>> >>>>>>>>>>> Here performance in July 2022 is worse than in February 2022 >>>>>>>>>>> using 512 boxes per node: >>>>>>>>>>> >>>>>>>>>>> Performance is now back to previous better performance. I really >>>>>>>>>>> wonder what happened in July. Do you have any idea? But the numbers of >>>>>>>>>>> February 2022 are similar to what you have in the plot you sent below. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>> >>>>>>>>>>> Sent: Wednesday, February 1, 2023 06:00 PM >>>>>>>>>>> To: Yang, Ulrike Meier > >>>>>>>>>>> Cc: Li, Rui Peng >; >>>>>>>>>>> MFAdams at LBL.GOV >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'd be glad to show you the data in case you're interested. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 1, 2023 at 5:55 PM Yang, Ulrike Meier < >>>>>>>>>>> yang11 at llnl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> Never mind. I read your new message before the one you sent >>>>>>>>>>> before. So, the figures are correct then >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Get Outlook for iOS< >>>>>>>>>>> https://urldefense.us/v3/__https:/aka.ms/o0ukef__;!!G2kpM7uM-TzIFchu!jtnof7503SCD3sNKdbf-8RTND6Q2FRuyxk2zGyChnupBjnjN-TS7Fjzp1tA-1lor$ >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>> ________________________________ >>>>>>>>>>> >>>>>>>>>>> From: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>> >>>>>>>>>>> Sent: Wednesday, February 1, 2023 5:08:03 PM >>>>>>>>>>> To: Yang, Ulrike Meier > >>>>>>>>>>> Cc: Li, Rui Peng >; >>>>>>>>>>> MFAdams at LBL.GOV >>>>>>>>>> MFAdams at LBL.GOV>> >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Just checked. That was a different scaling plot where the weak >>>>>>>>>>> scaling started with N=2 nodes for the 512 box problem (not N=1). So, I can >>>>>>>>>>> do the same for the new executable and see what we get. Should have >>>>>>>>>>> labelled the previous figure with more detail because with log scale it is >>>>>>>>>>> difficult to see the abscissa of the first datapoint >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In previous weak scaling I have put several on one plot and >>>>>>>>>>> annotate with the starting node count for each curve: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 1, 2023 at 4:56 PM Yang, Ulrike Meier < >>>>>>>>>>> yang11 at llnl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> I was referring to the figure below in my previous email. >>>>>>>>>>> >>>>>>>>>>> The timings are different, so you were probably running >>>>>>>>>>> something a bit different, but it shows some nice improvement. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Ulrike >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Li, Rui Peng > >>>>>>>>>>> Sent: Tuesday, July 26, 2022 4:35 PM >>>>>>>>>>> To: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>>; MFAdams at LBL.GOV >>>>>>>>>>> Cc: Yang, Ulrike Meier > >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi David, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thank you for the scaling result which looks nice. The slight >>>>>>>>>>> performance improvement was probably from recent code optimizations. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> About 64 integers, I assumed you were talking about hypre?s >>>>>>>>>>> bigInt option on GPUs (? Correct me if wrong). I don?t see why you have to >>>>>>>>>>> use it instead of mixedInt. I believe mixedInt can handle as big problems >>>>>>>>>>> as bigInt can do (@Ulrike is it correct?). Having a 60B or 300B global size >>>>>>>>>>> doesn?t seem to be an obstacle to me for mixedInt. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hope this makes sense. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> . -- .- .. .-.. / ..-. .-. --- -- / .-. ..- .. .--. . -. --. / >>>>>>>>>>> .-.. .. >>>>>>>>>>> >>>>>>>>>>> Rui Peng Li >>>>>>>>>>> >>>>>>>>>>> Center for Applied Scientific Computing >>>>>>>>>>> >>>>>>>>>>> Lawrence Livermore National Laboratory >>>>>>>>>>> >>>>>>>>>>> P.O. Box 808, L-561 Livermore, CA 94551 >>>>>>>>>>> >>>>>>>>>>> phone - (925) 422-6037, email - li50 at llnl.gov>>>>>>>>>> li50 at llnl.gov> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>> >>>>>>>>>>> Date: Tuesday, July 26, 2022 at 3:40 PM >>>>>>>>>>> To: MFAdams at LBL.GOV >>>>>>>>>> > >>>>>>>>>>> Cc: Li, Rui Peng >, Yang, >>>>>>>>>>> Ulrike Meier > >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> Ok, looks like my build has worked. It reproduces the weak >>>>>>>>>>> scaling numbers that I had in Feb and in May and in fact the times are >>>>>>>>>>> slightly better. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Ruipeng/Ulrike: 64 integers seems to be the sticking point for >>>>>>>>>>> me since my runs are high d.o.f. and they're only going to get bigger so >>>>>>>>>>> having hypre run with 64 int on GPU is probably needed. The largest problem >>>>>>>>>>> that I run for my scaling test on Crusher is about 6B dof on 128 nodes. On >>>>>>>>>>> Frontier we will certainly be 10x that problem size and probably 50x. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Mark: I still would like to get an official build from you when >>>>>>>>>>> you get back from vacation just to have that in a safe place and to make >>>>>>>>>>> sure we are on the same page. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Here's the configure file I used: >>>>>>>>>>> >>>>>>>>>>> #!/usr/bin/python3 >>>>>>>>>>> if __name__ == '__main__': >>>>>>>>>>> import sys >>>>>>>>>>> import os >>>>>>>>>>> sys.path.insert(0, os.path.abspath('config')) >>>>>>>>>>> import configure >>>>>>>>>>> configure_options = [ >>>>>>>>>>> '--download-hypre', >>>>>>>>>>> '--download-hypre-commit=master', >>>>>>>>>>> '--download-hypre-configure-arguments=--enable-bigint=no >>>>>>>>>>> --enable-mixedint=yes', >>>>>>>>>>> >>>>>>>>>>> '--prefix=/gpfs/alpine/world-shared/geo127/petsc_treb/arch-crusher-amd-opt-int64-master', >>>>>>>>>>> '--with-64-bit-indices=1', >>>>>>>>>>> '--with-cc=cc', >>>>>>>>>>> '--with-cxx=CC', >>>>>>>>>>> '--with-debugging=0', >>>>>>>>>>> '--with-fc=ftn', >>>>>>>>>>> '--with-hip', >>>>>>>>>>> '--with-hipc=hipcc', >>>>>>>>>>> '--with-mpiexec=srun', >>>>>>>>>>> 'LIBS=-L/opt/cray/pe/mpich/8.1.16/gtl/lib -lmpi_gtl_hsa', >>>>>>>>>>> 'PETSC_ARCH=arch-olcf-crusher-amd-opt-int64-master', >>>>>>>>>>> >>>>>>>>>>> 'PETSC_DIR=/gpfs/alpine/world-shared/geo127/petsc_treb/petsc', >>>>>>>>>>> ] >>>>>>>>>>> configure.petsc_configure(configure_options) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> And here's the module list: >>>>>>>>>>> >>>>>>>>>>> Currently Loaded Modules: >>>>>>>>>>> 1) craype-x86-trento >>>>>>>>>>> 2) libfabric/1.15.0.0< >>>>>>>>>>> https://urldefense.us/v3/__http:/1.15.0.0__;!!G2kpM7uM-TzIFchu!n794qlYekpYcOxHOM02fRuGjxyA6-PY6Bp_NJGcse4LvoqXq878zIvHJeI6a5Rk$ >>>>>>>>>>> > >>>>>>>>>>> 3) craype-network-ofi >>>>>>>>>>> 4) perftools-base/22.05.0 >>>>>>>>>>> 5) xpmem/2.4.4-2.3_2.12__gff0e1d9.shasta >>>>>>>>>>> 6) cray-pmi/6.1.2 >>>>>>>>>>> 7) rocm/5.1.0 >>>>>>>>>>> 8) subversion/1.14.1 >>>>>>>>>>> 9) emacs/28.1 >>>>>>>>>>> 10) amd/5.1.0 >>>>>>>>>>> 11) craype/2.7.15 >>>>>>>>>>> 12) cray-dsmml/0.2.2 >>>>>>>>>>> 13) cray-mpich/8.1.16 >>>>>>>>>>> 14) cray-libsci/21.08.1.2< >>>>>>>>>>> https://urldefense.us/v3/__http:/21.08.1.2__;!!G2kpM7uM-TzIFchu!n794qlYekpYcOxHOM02fRuGjxyA6-PY6Bp_NJGcse4LvoqXq878zIvHJSW0Wh3g$ >>>>>>>>>>> > >>>>>>>>>>> 15) PrgEnv-amd/8.3.3 >>>>>>>>>>> 16) xalt/1.3.0 >>>>>>>>>>> 17) DefApps/default >>>>>>>>>>> 18) cray-hdf5-parallel/1.12.1.1< >>>>>>>>>>> https://urldefense.us/v3/__http:/1.12.1.1__;!!G2kpM7uM-TzIFchu!n794qlYekpYcOxHOM02fRuGjxyA6-PY6Bp_NJGcse4LvoqXq878zIvHJMJWAYCE$ >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2022 at 8:42 PM Mark Adams >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>> '--download-hypre-commit=master', >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You might want: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> '--download-hypre-commit=origin/master', >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> But, You should ask questions on the mailing list petsc-maint < >>>>>>>>>>> petsc-maint at mcs.anl.gov> (not >>>>>>>>>>> archived). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Mark >>>>>>>>>>> >>>>>>>>>>> ps, I am on vacation and will be back on the1st >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2022 at 9:00 PM David Trebotich < >>>>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> I am not getting anywhere with this. I'll have to wait for Mark >>>>>>>>>>> to do the petsc build with hypre. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I tried the following to get the hypre master branch but I am >>>>>>>>>>> not sure if this is the right incantation: >>>>>>>>>>> >>>>>>>>>>> '--download-hypre', >>>>>>>>>>> >>>>>>>>>>> '--download-hypre-commit=master', >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I did get a build with that but still get same problem with >>>>>>>>>>> scaling. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Here's my configure script: >>>>>>>>>>> >>>>>>>>>>> #!/usr/bin/python3 >>>>>>>>>>> if __name__ == '__main__': >>>>>>>>>>> import sys >>>>>>>>>>> import os >>>>>>>>>>> sys.path.insert(0, os.path.abspath('config')) >>>>>>>>>>> import configure >>>>>>>>>>> configure_options = [ >>>>>>>>>>> '--download-hypre', >>>>>>>>>>> '--download-hypre-commit=master', >>>>>>>>>>> '--download-hypre-configure-arguments=--enable-bigint=no >>>>>>>>>>> --enable-mixedint=yes', >>>>>>>>>>> >>>>>>>>>>> '--prefix=/gpfs/alpine/world-shared/geo127/petsc_treb/arch-crusher-amd-opt-int64-master', >>>>>>>>>>> '--with-64-bit-indices=1', >>>>>>>>>>> '--with-cc=cc', >>>>>>>>>>> '--with-cxx=CC', >>>>>>>>>>> '--with-debugging=0', >>>>>>>>>>> '--with-fc=ftn', >>>>>>>>>>> '--with-hip', >>>>>>>>>>> '--with-hipc=hipcc', >>>>>>>>>>> '--with-mpiexec=srun', >>>>>>>>>>> 'LIBS=-L/opt/cray/pe/mpich/8.1.16/gtl/lib -lmpi_gtl_hsa', >>>>>>>>>>> 'PETSC_ARCH=arch-olcf-crusher-amd-opt-int64-master', >>>>>>>>>>> ] >>>>>>>>>>> configure.petsc_configure(configure_options) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Currently Loaded Modules: >>>>>>>>>>> 1) craype-x86-trento >>>>>>>>>>> 2) libfabric/1.15.0.0< >>>>>>>>>>> https://urldefense.us/v3/__http:/1.15.0.0__;!!G2kpM7uM-TzIFchu!n794qlYekpYcOxHOM02fRuGjxyA6-PY6Bp_NJGcse4LvoqXq878zIvHJeI6a5Rk$ >>>>>>>>>>> > >>>>>>>>>>> 3) craype-network-ofi >>>>>>>>>>> 4) perftools-base/22.05.0 >>>>>>>>>>> 5) xpmem/2.4.4-2.3_2.12__gff0e1d9.shasta >>>>>>>>>>> 6) cray-pmi/6.1.2 >>>>>>>>>>> 7) emacs/27.2 >>>>>>>>>>> 8) rocm/5.1.0 >>>>>>>>>>> 9) subversion/1.14.1 >>>>>>>>>>> 10) amd/5.1.0 >>>>>>>>>>> 11) craype/2.7.15 >>>>>>>>>>> 12) cray-dsmml/0.2.2 >>>>>>>>>>> 13) cray-mpich/8.1.16 >>>>>>>>>>> 14) cray-libsci/21.08.1.2< >>>>>>>>>>> https://urldefense.us/v3/__http:/21.08.1.2__;!!G2kpM7uM-TzIFchu!n794qlYekpYcOxHOM02fRuGjxyA6-PY6Bp_NJGcse4LvoqXq878zIvHJSW0Wh3g$ >>>>>>>>>>> > >>>>>>>>>>> 15) PrgEnv-amd/8.3.3 >>>>>>>>>>> 16) xalt/1.3.0 >>>>>>>>>>> 17) DefApps/default >>>>>>>>>>> 18) cray-hdf5-parallel/1.12.1.1< >>>>>>>>>>> https://urldefense.us/v3/__http:/1.12.1.1__;!!G2kpM7uM-TzIFchu!n794qlYekpYcOxHOM02fRuGjxyA6-PY6Bp_NJGcse4LvoqXq878zIvHJMJWAYCE$ >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2022 at 11:20 AM David Trebotich < >>>>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> that was wrong mpich. I got much further in the configure. How >>>>>>>>>>> do I know if I got the master branch of hypre? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2022 at 10:46 AM David Trebotich < >>>>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> I use the following configure: >>>>>>>>>>> >>>>>>>>>>> #!/usr/bin/python3 >>>>>>>>>>> if __name__ == '__main__': >>>>>>>>>>> import sys >>>>>>>>>>> import os >>>>>>>>>>> sys.path.insert(0, os.path.abspath('config')) >>>>>>>>>>> import configure >>>>>>>>>>> configure_options = [ >>>>>>>>>>> '--download-hypre', >>>>>>>>>>> '--download-hypre-commit=master', >>>>>>>>>>> '--download-hypre-configure-arguments=--enable-bigint=no >>>>>>>>>>> --enable-mixedint=yes', >>>>>>>>>>> >>>>>>>>>>> '--prefix=/gpfs/alpine/world-shared/geo127/petsc_treb/arch-crusher-cray-opt-int64-master', >>>>>>>>>>> '--with-64-bit-indices=1', >>>>>>>>>>> '--with-cc=cc', >>>>>>>>>>> '--with-cxx=CC', >>>>>>>>>>> '--with-debugging=0', >>>>>>>>>>> '--with-fc=ftn', >>>>>>>>>>> '--with-hip', >>>>>>>>>>> '--with-hipc=hipcc', >>>>>>>>>>> '--with-mpiexec=srun', >>>>>>>>>>> 'LIBS=-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa', >>>>>>>>>>> 'PETSC_ARCH=arch-olcf-crusher-cray-opt-int64-master', >>>>>>>>>>> ] >>>>>>>>>>> configure.petsc_configure(configure_options) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> and get: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> Configuring PETSc to compile on your >>>>>>>>>>> system >>>>>>>>>>> >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> ***** >>>>>>>>>>> WARNING: Using default optimization C flags -O >>>>>>>>>>> You >>>>>>>>>>> might consider manually setting optimal optimization flags for your system >>>>>>>>>>> with >>>>>>>>>>> COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for >>>>>>>>>>> examples >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> ***** >>>>>>>>>>> WARNING: Using default Cxx optimization flags -O >>>>>>>>>>> You >>>>>>>>>>> might consider manually setting optimal optimization flags for your system >>>>>>>>>>> with >>>>>>>>>>> CXXOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for >>>>>>>>>>> examples >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> ***** >>>>>>>>>>> WARNING: Using default FORTRAN optimization flags -O >>>>>>>>>>> You >>>>>>>>>>> might consider manually setting optimal optimization flags for your system >>>>>>>>>>> with >>>>>>>>>>> FOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for >>>>>>>>>>> examples >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> ***** >>>>>>>>>>> WARNING: Using default HIP optimization flags -g -O3 >>>>>>>>>>> You >>>>>>>>>>> might consider manually setting optimal optimization flags for your system >>>>>>>>>>> with >>>>>>>>>>> HIPOPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for >>>>>>>>>>> examples >>>>>>>>>>> ============================================================================================= >>>>>>>>>>> TESTING: >>>>>>>>>>> checkFortranLibraries from >>>>>>>>>>> config.compilers(config/BuildSystem/config/compilers.py:835) >>>>>>>>>>> >>>>>>>>>>> ******************************************************************************* >>>>>>>>>>> OSError while running ./configure >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------------- >>>>>>>>>>> Cannot run executables created with FC. If this machine uses a >>>>>>>>>>> batch system >>>>>>>>>>> to submit jobs you will need to configure using ./configure with >>>>>>>>>>> the additional option --with-batch. >>>>>>>>>>> Otherwise there is problem with the compilers. Can you compile >>>>>>>>>>> and run code with your compiler 'ftn'? >>>>>>>>>>> >>>>>>>>>>> ******************************************************************************* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2022 at 9:50 AM David Trebotich < >>>>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> I think recent builds have been hypre v2.25 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2022 at 9:49 AM David Trebotich < >>>>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> so instead of just >>>>>>>>>>> >>>>>>>>>>> '--download-hypre', >>>>>>>>>>> >>>>>>>>>>> add >>>>>>>>>>> >>>>>>>>>>> '--download-hypre', >>>>>>>>>>> '--download-hypre-commit=master', >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ??? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2022 at 9:47 AM Li, Rui Peng >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>> As Ulrike said, AMD recently found bugs regarding the bigInt >>>>>>>>>>> issue, which have been fixed in the current master. I suggest using the >>>>>>>>>>> master branch of hypre if possible. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> . -- .- .. .-.. / ..-. .-. --- -- / .-. ..- .. .--. . -. --. / >>>>>>>>>>> .-.. .. >>>>>>>>>>> >>>>>>>>>>> Rui Peng Li >>>>>>>>>>> >>>>>>>>>>> Center for Applied Scientific Computing >>>>>>>>>>> >>>>>>>>>>> Lawrence Livermore National Laboratory >>>>>>>>>>> >>>>>>>>>>> P.O. Box 808, L-561 Livermore, CA 94551 >>>>>>>>>>> >>>>>>>>>>> phone - (925) 422-6037, email - li50 at llnl.gov>>>>>>>>>> li50 at llnl.gov> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Yang, Ulrike Meier >>>>>>>>>> >> >>>>>>>>>>> Date: Thursday, July 21, 2022 at 9:41 AM >>>>>>>>>>> To: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>>, Li, Rui Peng >>>>>>>>>> li50 at llnl.gov>> >>>>>>>>>>> Cc: MFAdams at LBL.GOV >>>>>>>>>> > >>>>>>>>>>> Subject: RE: Frontier >>>>>>>>>>> >>>>>>>>>>> Actually, I think it was 2000 nodes! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Yang, Ulrike Meier >>>>>>>>>>> Sent: Thursday, July 21, 2022 9:40 AM >>>>>>>>>>> To: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>>; Li, Rui Peng >>>>>>>>>> li50 at llnl.gov>> >>>>>>>>>>> Cc: MFAdams at LBL.GOV >>>>>>>>>>> Subject: RE: Frontier >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Which version of hypre are you using for this? >>>>>>>>>>> >>>>>>>>>>> We recently found one bug in the mixed-int version, however that >>>>>>>>>>> should have been an issue also in your previous runs that apparently were >>>>>>>>>>> working. >>>>>>>>>>> >>>>>>>>>>> Note that recent runs by AMD on Frontier with hypre were >>>>>>>>>>> successful on more than 200 nodes using mixed-int, so we should be able to >>>>>>>>>>> get this to work somehow for you guys. They also found the bug in mixed-int. >>>>>>>>>>> >>>>>>>>>>> Ulrike >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>> >>>>>>>>>>> Sent: Thursday, July 21, 2022 9:30 AM >>>>>>>>>>> To: Li, Rui Peng > >>>>>>>>>>> Cc: MFAdams at LBL.GOV; Yang, Ulrike Meier >>>>>>>>>>> > >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Ruipeng and Ulrike >>>>>>>>>>> >>>>>>>>>>> You asked if we need 64 int for gpus and I think we definitely >>>>>>>>>>> do need it. Currently I cannot scale past that 2B degree of freedom mark >>>>>>>>>>> that you mentioned. I am not sure what happened between Mark's Cray build >>>>>>>>>>> in February and his amd build in May but currently I cannot scale past 32 >>>>>>>>>>> nodes on Crusher. This is unfortunate because given the success over the >>>>>>>>>>> past 6 months I have told ECP that we are fully ready for Frontier. Now, we >>>>>>>>>>> are not. Hopefully we can figure this out pretty soon and be ready to take >>>>>>>>>>> a shot on Frontier when they let us on. >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 18, 2022 at 5:03 PM Li, Rui Peng >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Building with unified memory will *not* change the default >>>>>>>>>>> parameters of AMG. Are you using the master branch of hypre or some release >>>>>>>>>>> version? I think our previous fix should be included in the latest release. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Please let me know if I can further help >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> . -- .- .. .-.. / ..-. .-. --- -- / .-. ..- .. .--. . -. --. / >>>>>>>>>>> .-.. .. >>>>>>>>>>> >>>>>>>>>>> Rui Peng Li >>>>>>>>>>> >>>>>>>>>>> Center for Applied Scientific Computing >>>>>>>>>>> >>>>>>>>>>> Lawrence Livermore National Laboratory >>>>>>>>>>> >>>>>>>>>>> P.O. Box 808, L-561 Livermore, CA 94551 >>>>>>>>>>> >>>>>>>>>>> phone - (925) 422-6037, email - li50 at llnl.gov>>>>>>>>>> li50 at llnl.gov> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Mark Adams > >>>>>>>>>>> Date: Monday, July 18, 2022 at 1:55 PM >>>>>>>>>>> To: David Trebotich >>>>>>>>>> dptrebotich at lbl.gov>> >>>>>>>>>>> Cc: Li, Rui Peng >, Yang, >>>>>>>>>>> Ulrike Meier > >>>>>>>>>>> Subject: Re: Frontier >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 18, 2022 at 4:35 PM David Trebotich < >>>>>>>>>>> dptrebotich at lbl.gov> wrote: >>>>>>>>>>> >>>>>>>>>>> When I run with Mark's newest build then I get stuck in the nnz >>>>>>>>>>> bin counts for the first solve (proj_mac). Here's the stack: >>>>>>>>>>> >>>>>>>>>>> [0]PETSC ERROR: #1 jac->setup() at >>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/hypre/hypre.c:420 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This is the same place where we got this hypre error "(12)" >>>>>>>>>>> before. >>>>>>>>>>> >>>>>>>>>>> Recall this error message means that there is a zero row in the >>>>>>>>>>> matrix. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I may have been using the master branch of hypre when I built >>>>>>>>>>> that working version. >>>>>>>>>>> >>>>>>>>>>> Maybe this branch was fixed to accept zero rows? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Rui Peng: I am building with UVM now. Does that change the >>>>>>>>>>> defaults in hypre? >>>>>>>>>>> >>>>>>>>>>> For instance, does hypre use Falgout coursening if UVM is >>>>>>>>>>> available? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [0]PETSC ERROR: #2 PCSetUp_HYPRE() at >>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/hypre/hypre.c:237 >>>>>>>>>>> [0]PETSC ERROR: #3 PCSetUp() at >>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:949 >>>>>>>>>>> [0]PETSC ERROR: #4 KSPSetUp() at >>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:314 >>>>>>>>>>> [0]PETSC ERROR: #5 KSPSolve_Private() at >>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:792 >>>>>>>>>>> [0]PETSC ERROR: #6 KSPSolve() at >>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1061 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> And here's my .petscrc: >>>>>>>>>>> >>>>>>>>>>> -help >>>>>>>>>>> >>>>>>>>>>> -proj_mac_mat_type hypre >>>>>>>>>>> -proj_mac_pc_type hypre >>>>>>>>>>> -proj_mac_pc_hypre_type boomeramg >>>>>>>>>>> -proj_mac_pc_hypre_boomeramg_no_CF >>>>>>>>>>> -proj_mac_pc_hypre_boomeramg_agg_nl 0 >>>>>>>>>>> -proj_mac_pc_hypre_boomeramg_coarsen_type PMIS >>>>>>>>>>> -proj_mac_pc_hypre_boomeramg_interp_type ext+i >>>>>>>>>>> -proj_mac_pc_hypre_boomeramg_print_statistics >>>>>>>>>>> -proj_mac_pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi >>>>>>>>>>> -proj_mac_pc_hypre_SetSpGemmUseCusparse 0 >>>>>>>>>>> >>>>>>>>>>> -proj_mac_ksp_type gmres >>>>>>>>>>> -proj_mac_ksp_max_it 50 >>>>>>>>>>> -proj_mac_ksp_rtol 1.e-12 >>>>>>>>>>> -proj_mac_ksp_atol 1.e-30 >>>>>>>>>>> >>>>>>>>>>> -use_gpu_aware_mpi 0 >>>>>>>>>>> >>>>>>>>>>> -info >>>>>>>>>>> -log_view >>>>>>>>>>> -history PETSc.history >>>>>>>>>>> -options_left >>>>>>>>>>> >>>>>>>>>>> -visc_pc_type jacobi >>>>>>>>>>> >>>>>>>>>>> -visc_pc_hypre_type boomeramg >>>>>>>>>>> -visc_ksp_type gmres >>>>>>>>>>> -visc_ksp_max_it 50 >>>>>>>>>>> -visc_ksp_rtol 1.e-12 >>>>>>>>>>> >>>>>>>>>>> -diff_pc_type jacobi >>>>>>>>>>> -diff_pc_hypre_type boomeramg >>>>>>>>>>> -diff_ksp_type gmres >>>>>>>>>>> -diff_ksp_max_it 50 >>>>>>>>>>> -diff_ksp_rtol 1.e-6 >>>>>>>>>>> >>>>>>>>>>> -proj_mac_ksp_converged_reason >>>>>>>>>>> -visc_ksp_converged_reason >>>>>>>>>>> -diff_ksp_converged_reason >>>>>>>>>>> -proj_mac_ksp_norm_type unpreconditioned >>>>>>>>>>> -diff_ksp_norm_type unpreconditioned >>>>>>>>>>> -visc_ksp_norm_type unpreconditioned >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 18, 2022 at 1:30 PM Mark Adams >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 18, 2022 at 4:18 PM Li, Rui Peng >>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>> Yes, there is no need for enable-unified-memory, unless you want >>>>>>>>>>> to use non-GPU supported parameter of AMG (such as Falgout coarsening) >>>>>>>>>>> which needs unified memory since it will run on CPUs. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Got it, Will not use UVM. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> UVM is unified memory. Our expert from AMD told us to not use >>>>>>>>>>> unified memory yet. Maybe it's working now, but I haven't tried. >>>>>>>>>>> >>>>>>>>>>> 64-bit integer: Sorry, I did not make it clear. "mixed-int" is a >>>>>>>>>>> more efficient approach for problems with > 2B dofs where the local integer >>>>>>>>>>> type is kept in 32-bit while the global one is 64-bit. This is the only way >>>>>>>>>>> we currently support on GPUs. hypre also has "--enable-big-int" which has >>>>>>>>>>> all the integers (local and global) in 64-bit, which we don't have on GPUs. >>>>>>>>>>> For some users, it is difficult for their code to handle two integer types >>>>>>>>>>> (in mixed-int), so they prefer the old "big-int" approach. That's why I was >>>>>>>>>>> asking. If "mixed-int" works for you, that's ideal. No need to bother. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I see. I only care about the interface so the current parameters >>>>>>>>>>> are fine. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --enable-bigint=no --enable-mixedint=yes >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think PETSc should always use this, with 64 bit ints, because >>>>>>>>>>> we only care about the interface and I trust the local problem will be < 2B. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ---------------------- >>>>>>>>>>> David Trebotich >>>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>>> Computational Research Division >>>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>>> treb at lbl.gov >>>>>>>>>>> (510) 486-5984 office >>>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ---------------------- >>>>>>>>>> David Trebotich >>>>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>>>> Computational Research Division >>>>>>>>>> Applied Numerical Algorithms Group >>>>>>>>>> treb at lbl.gov >>>>>>>>>> (510) 486-5984 office >>>>>>>>>> (510) 384-6868 mobile >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ---------------------- >>>>>>>> David Trebotich >>>>>>>> Lawrence Berkeley National Laboratory >>>>>>>> Computational Research Division >>>>>>>> Applied Numerical Algorithms Group >>>>>>>> treb at lbl.gov >>>>>>>> (510) 486-5984 office >>>>>>>> (510) 384-6868 mobile >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> ---------------------- >>>>>> David Trebotich >>>>>> Lawrence Berkeley National Laboratory >>>>>> Computational Research Division >>>>>> Applied Numerical Algorithms Group >>>>>> treb at lbl.gov >>>>>> (510) 486-5984 office >>>>>> (510) 384-6868 mobile >>>>>> >>>>> >>>> >>>> -- >>>> ---------------------- >>>> David Trebotich >>>> Lawrence Berkeley National Laboratory >>>> Computational Research Division >>>> Applied Numerical Algorithms Group >>>> treb at lbl.gov >>>> (510) 486-5984 office >>>> (510) 384-6868 mobile >>>> >>> >>> >>> -- >>> ---------------------- >>> David Trebotich >>> Lawrence Berkeley National Laboratory >>> Computational Research Division >>> Applied Numerical Algorithms Group >>> treb at lbl.gov >>> (510) 486-5984 office >>> (510) 384-6868 mobile >>> >> > > -- > ---------------------- > David Trebotich > Lawrence Berkeley National Laboratory > Computational Research Division > Applied Numerical Algorithms Group > treb at lbl.gov > (510) 486-5984 office > (510) 384-6868 mobile > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 27 06:18:21 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Feb 2023 07:18:21 -0500 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: On Sat, Feb 18, 2023 at 2:25?AM Zongze Yang wrote: > Dear PETSc Group, > > I am writing to inquire about the function DMAdaptLabel in PETSc. > I am trying to use it coarse a mesh, but the resulting mesh is refined. > > In the following code, all of the `adpat` label values were set to 2 > (DM_ADAPT_COARSEN). > There must be something wrong. Could you give some suggestions? > Sorry for the late reply. You are right, I need to put in error messages for this. Here is what is happening. PETSc tries to fallback if you do not have certain packages. In this case, you are not using DMForest, which responds to both coarsen and refine, so the mesh generator interprets all markers as refine (they cannot coarsen). I will add a check that fails on the coarsen marker. Coarsening is much more difficult in the presence of boundaries, which is why it is not implemented in most packages. For unstructured coarsening, I do not think there is any choice but MMG. Thanks, Matt ```python > from firedrake import * > from firedrake.petsc import PETSc > > def mark_all_cells(mesh): > plex = mesh.topology_dm > with PETSc.Log.Event("ADD_ADAPT_LABEL"): > plex.createLabel('adapt') > cs, ce = plex.getHeightStratum(0) > for i in range(cs, ce): > plex.setLabelValue('adapt', i, 2) > > return plex > > mesh = RectangleMesh(10, 10, 1, 1) > > x = SpatialCoordinate(mesh) > V = FunctionSpace(mesh, 'CG', 1) > f = Function(V).interpolate(10 + 10*sin(x[0])) > triplot(mesh) > > plex = mark_all_cells(mesh) > new_plex = plex.adaptLabel('adapt') > mesh = Mesh(new_plex) > triplot(mesh) > ``` > > Thank you very much for your time. > > Best wishes, > Zongze > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 27 06:19:35 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Feb 2023 07:19:35 -0500 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: On Sat, Feb 18, 2023 at 6:41?AM Zongze Yang wrote: > Another question on mesh coarsening is about `DMCoarsen` which will fail > when running in parallel. > > I generate a mesh in Firedrake, and then create function space and > functions, after that, I get the dmplex and coarsen it. > When running in serials, I get the mesh coarsened correctly. But it failed > with errors in ParMMG when running parallel. > > However, If I did not create function space and functions on the original > mesh, everything works fine too. > > The code and the error logs are attached. > I believe the problem is that Firedrake and PETSc currently have incompatible coordinate spaces. We are working to fix this, and I expect it to work by this summer. Thanks, Matt > Thank you for your time and attention? > > Best wishes, > Zongze > > > On Sat, 18 Feb 2023 at 15:24, Zongze Yang wrote: > >> Dear PETSc Group, >> >> I am writing to inquire about the function DMAdaptLabel in PETSc. >> I am trying to use it coarse a mesh, but the resulting mesh is refined. >> >> In the following code, all of the `adpat` label values were set to 2 >> (DM_ADAPT_COARSEN). >> There must be something wrong. Could you give some suggestions? >> >> ```python >> from firedrake import * >> from firedrake.petsc import PETSc >> >> def mark_all_cells(mesh): >> plex = mesh.topology_dm >> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >> plex.createLabel('adapt') >> cs, ce = plex.getHeightStratum(0) >> for i in range(cs, ce): >> plex.setLabelValue('adapt', i, 2) >> >> return plex >> >> mesh = RectangleMesh(10, 10, 1, 1) >> >> x = SpatialCoordinate(mesh) >> V = FunctionSpace(mesh, 'CG', 1) >> f = Function(V).interpolate(10 + 10*sin(x[0])) >> triplot(mesh) >> >> plex = mark_all_cells(mesh) >> new_plex = plex.adaptLabel('adapt') >> mesh = Mesh(new_plex) >> triplot(mesh) >> ``` >> >> Thank you very much for your time. >> >> Best wishes, >> Zongze >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jl7037 at mun.ca Mon Feb 27 07:41:19 2023 From: jl7037 at mun.ca (Long, Jianbo) Date: Mon, 27 Feb 2023 14:41:19 +0100 Subject: [petsc-users] petsc compiled without MPI In-Reply-To: References: <89666060-9D66-4B4D-AE11-B95CCE97FEA9@joliv.et> Message-ID: Thanks for the explanations ! It turns out the issue of running sequentially compiled petsc is PetscFinalize() function. Since my subroutine involving petsc functions needs to be called multiple times in the program, I have to comment out PetscFinalize() at the end of the subroutine, otherwise at the next call of this subroutine, petsc would stop and throw out an error about MPI_Comm_set_errhandler ! Jianbo On Sun, Feb 26, 2023 at 4:39 PM Satish Balay wrote: > On Sun, 26 Feb 2023, Pierre Jolivet wrote: > > > > > > > > On 25 Feb 2023, at 11:44 PM, Long, Jianbo wrote: > > > > > > Hello, > > > > > > For some of my applications, I need to use petsc without mpi, or use > it sequentially. I wonder where I can find examples/tutorials for this ? > > > > You can run sequentially with just a single MPI process (-n 1). > > even if you build with mpich/openmpi - you can run sequentially without > mpiexec - i.e: > > ./binary > > One reason to do this [instead of building PETSc with --with-mpi=0] - is > if you are mixing in multiple pkgs that have MPI dependencies [in which > case - its best to build all these pkgs with the same mpich or openmpi - > but still run sequentially]. > > Satish > > > If you need to run without MPI whatsoever, you?ll need to have a > separate PETSc installation which was configured --with-mpi=0 > > In both cases, the same user-code will run, i.e., all PETSc examples > available with the sources will work (though some are designed purely for > parallel experiments and may error out early on purpose). > > > > Thanks, > > Pierre > > > > > Thanks very much, > > > Jianbo Long > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 27 07:46:03 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Feb 2023 08:46:03 -0500 Subject: [petsc-users] petsc compiled without MPI In-Reply-To: References: <89666060-9D66-4B4D-AE11-B95CCE97FEA9@joliv.et> Message-ID: On Mon, Feb 27, 2023 at 8:41?AM Long, Jianbo wrote: > Thanks for the explanations ! It turns out the issue of running > sequentially compiled petsc is PetscFinalize() function. Since my > subroutine involving petsc functions needs to be called multiple times in > the program, I have to comment out PetscFinalize() at the end of the > subroutine, otherwise at the next call of this subroutine, petsc would stop > and throw out an error about MPI_Comm_set_errhandler ! > Yes, you are supposed to call PetscInitialize() _once_ at the beginning of the program, and PetscFinalize() _once_ at the end of the program. Thanks, Matt > Jianbo > > On Sun, Feb 26, 2023 at 4:39 PM Satish Balay wrote: > >> On Sun, 26 Feb 2023, Pierre Jolivet wrote: >> >> > >> > >> > > On 25 Feb 2023, at 11:44 PM, Long, Jianbo wrote: >> > > >> > > Hello, >> > > >> > > For some of my applications, I need to use petsc without mpi, or use >> it sequentially. I wonder where I can find examples/tutorials for this ? >> > >> > You can run sequentially with just a single MPI process (-n 1). >> >> even if you build with mpich/openmpi - you can run sequentially without >> mpiexec - i.e: >> >> ./binary >> >> One reason to do this [instead of building PETSc with --with-mpi=0] - is >> if you are mixing in multiple pkgs that have MPI dependencies [in which >> case - its best to build all these pkgs with the same mpich or openmpi - >> but still run sequentially]. >> >> Satish >> >> > If you need to run without MPI whatsoever, you?ll need to have a >> separate PETSc installation which was configured --with-mpi=0 >> > In both cases, the same user-code will run, i.e., all PETSc examples >> available with the sources will work (though some are designed purely for >> parallel experiments and may error out early on purpose). >> > >> > Thanks, >> > Pierre >> > >> > > Thanks very much, >> > > Jianbo Long >> > >> > >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Mon Feb 27 08:45:32 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Mon, 27 Feb 2023 22:45:32 +0800 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: Hi, Matt Thanks for your clarification. Can I change the type of DMPlex to DMForest? Best wishes, Zongze On Mon, 27 Feb 2023 at 20:18, Matthew Knepley wrote: > On Sat, Feb 18, 2023 at 2:25?AM Zongze Yang wrote: > >> Dear PETSc Group, >> >> I am writing to inquire about the function DMAdaptLabel in PETSc. >> I am trying to use it coarse a mesh, but the resulting mesh is refined. >> >> In the following code, all of the `adpat` label values were set to 2 >> (DM_ADAPT_COARSEN). >> There must be something wrong. Could you give some suggestions? >> > > Sorry for the late reply. You are right, I need to put in error messages > for this. Here is what is happening. > PETSc tries to fallback if you do not have certain packages. In this case, > you are not using DMForest, > which responds to both coarsen and refine, so the > mesh generator interprets all markers as refine (they > cannot coarsen). I will add a check that fails on the coarsen marker. > > Coarsening is much more difficult in the presence of boundaries, which is > why it is not implemented in > most packages. For unstructured coarsening, I do not think there is any > choice but MMG. > > Thanks, > > Matt > > ```python >> from firedrake import * >> from firedrake.petsc import PETSc >> >> def mark_all_cells(mesh): >> plex = mesh.topology_dm >> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >> plex.createLabel('adapt') >> cs, ce = plex.getHeightStratum(0) >> for i in range(cs, ce): >> plex.setLabelValue('adapt', i, 2) >> >> return plex >> >> mesh = RectangleMesh(10, 10, 1, 1) >> >> x = SpatialCoordinate(mesh) >> V = FunctionSpace(mesh, 'CG', 1) >> f = Function(V).interpolate(10 + 10*sin(x[0])) >> triplot(mesh) >> >> plex = mark_all_cells(mesh) >> new_plex = plex.adaptLabel('adapt') >> mesh = Mesh(new_plex) >> triplot(mesh) >> ``` >> >> Thank you very much for your time. >> >> Best wishes, >> Zongze >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Mon Feb 27 08:53:29 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Mon, 27 Feb 2023 22:53:29 +0800 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: Hi, Matt, I tested coarsening a mesh by using ParMMg without firedrake, and found some issues: see the code and results here: https://gitlab.com/petsc/petsc/-/issues/1331 Could you have a look and give some comments or suggestions? Best wishes, Zongze On Mon, 27 Feb 2023 at 20:19, Matthew Knepley wrote: > On Sat, Feb 18, 2023 at 6:41?AM Zongze Yang wrote: > >> Another question on mesh coarsening is about `DMCoarsen` which will fail >> when running in parallel. >> >> I generate a mesh in Firedrake, and then create function space and >> functions, after that, I get the dmplex and coarsen it. >> When running in serials, I get the mesh coarsened correctly. But it >> failed with errors in ParMMG when running parallel. >> >> However, If I did not create function space and functions on the original >> mesh, everything works fine too. >> >> The code and the error logs are attached. >> > > I believe the problem is that Firedrake and PETSc currently have > incompatible coordinate spaces. We are working > to fix this, and I expect it to work by this summer. > > Thanks, > > Matt > > >> Thank you for your time and attention? >> >> Best wishes, >> Zongze >> >> >> On Sat, 18 Feb 2023 at 15:24, Zongze Yang wrote: >> >>> Dear PETSc Group, >>> >>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>> I am trying to use it coarse a mesh, but the resulting mesh is refined. >>> >>> In the following code, all of the `adpat` label values were set to 2 >>> (DM_ADAPT_COARSEN). >>> There must be something wrong. Could you give some suggestions? >>> >>> ```python >>> from firedrake import * >>> from firedrake.petsc import PETSc >>> >>> def mark_all_cells(mesh): >>> plex = mesh.topology_dm >>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>> plex.createLabel('adapt') >>> cs, ce = plex.getHeightStratum(0) >>> for i in range(cs, ce): >>> plex.setLabelValue('adapt', i, 2) >>> >>> return plex >>> >>> mesh = RectangleMesh(10, 10, 1, 1) >>> >>> x = SpatialCoordinate(mesh) >>> V = FunctionSpace(mesh, 'CG', 1) >>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>> triplot(mesh) >>> >>> plex = mark_all_cells(mesh) >>> new_plex = plex.adaptLabel('adapt') >>> mesh = Mesh(new_plex) >>> triplot(mesh) >>> ``` >>> >>> Thank you very much for your time. >>> >>> Best wishes, >>> Zongze >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 27 08:53:42 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Feb 2023 09:53:42 -0500 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: On Mon, Feb 27, 2023 at 9:45?AM Zongze Yang wrote: > Hi, Matt > > Thanks for your clarification. Can I change the type of DMPlex to DMForest? > You can, however DMForest is for structured adaptive meshes using quadtrees, and I do not believe Firedrake works with that. Thanks, Matt > Best wishes, > Zongze > > > On Mon, 27 Feb 2023 at 20:18, Matthew Knepley wrote: > >> On Sat, Feb 18, 2023 at 2:25?AM Zongze Yang wrote: >> >>> Dear PETSc Group, >>> >>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>> I am trying to use it coarse a mesh, but the resulting mesh is refined. >>> >>> In the following code, all of the `adpat` label values were set to 2 >>> (DM_ADAPT_COARSEN). >>> There must be something wrong. Could you give some suggestions? >>> >> >> Sorry for the late reply. You are right, I need to put in error messages >> for this. Here is what is happening. >> PETSc tries to fallback if you do not have certain packages. In this >> case, you are not using DMForest, >> which responds to both coarsen and refine, so the >> mesh generator interprets all markers as refine (they >> cannot coarsen). I will add a check that fails on the coarsen marker. >> >> Coarsening is much more difficult in the presence of boundaries, which is >> why it is not implemented in >> most packages. For unstructured coarsening, I do not think there is any >> choice but MMG. >> >> Thanks, >> >> Matt >> >> ```python >>> from firedrake import * >>> from firedrake.petsc import PETSc >>> >>> def mark_all_cells(mesh): >>> plex = mesh.topology_dm >>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>> plex.createLabel('adapt') >>> cs, ce = plex.getHeightStratum(0) >>> for i in range(cs, ce): >>> plex.setLabelValue('adapt', i, 2) >>> >>> return plex >>> >>> mesh = RectangleMesh(10, 10, 1, 1) >>> >>> x = SpatialCoordinate(mesh) >>> V = FunctionSpace(mesh, 'CG', 1) >>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>> triplot(mesh) >>> >>> plex = mark_all_cells(mesh) >>> new_plex = plex.adaptLabel('adapt') >>> mesh = Mesh(new_plex) >>> triplot(mesh) >>> ``` >>> >>> Thank you very much for your time. >>> >>> Best wishes, >>> Zongze >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 27 08:59:23 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Feb 2023 09:59:23 -0500 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: On Mon, Feb 27, 2023 at 9:53?AM Zongze Yang wrote: > Hi, Matt, > > I tested coarsening a mesh by using ParMMg without firedrake, and found > some issues: > see the code and results here: > https://gitlab.com/petsc/petsc/-/issues/1331 > > Could you have a look and give some comments or suggestions? > I replied on the issue. More generally, the adaptive refinement software has not seen wide use, and I expect more of these kinds of bugs until more people use it. Thanks, Matt > Best wishes, > Zongze > > > On Mon, 27 Feb 2023 at 20:19, Matthew Knepley wrote: > >> On Sat, Feb 18, 2023 at 6:41?AM Zongze Yang wrote: >> >>> Another question on mesh coarsening is about `DMCoarsen` which will fail >>> when running in parallel. >>> >>> I generate a mesh in Firedrake, and then create function space and >>> functions, after that, I get the dmplex and coarsen it. >>> When running in serials, I get the mesh coarsened correctly. But it >>> failed with errors in ParMMG when running parallel. >>> >>> However, If I did not create function space and functions on the >>> original mesh, everything works fine too. >>> >>> The code and the error logs are attached. >>> >> >> I believe the problem is that Firedrake and PETSc currently have >> incompatible coordinate spaces. We are working >> to fix this, and I expect it to work by this summer. >> >> Thanks, >> >> Matt >> >> >>> Thank you for your time and attention? >>> >>> Best wishes, >>> Zongze >>> >>> >>> On Sat, 18 Feb 2023 at 15:24, Zongze Yang wrote: >>> >>>> Dear PETSc Group, >>>> >>>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>>> I am trying to use it coarse a mesh, but the resulting mesh is refined. >>>> >>>> In the following code, all of the `adpat` label values were set to 2 >>>> (DM_ADAPT_COARSEN). >>>> There must be something wrong. Could you give some suggestions? >>>> >>>> ```python >>>> from firedrake import * >>>> from firedrake.petsc import PETSc >>>> >>>> def mark_all_cells(mesh): >>>> plex = mesh.topology_dm >>>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>>> plex.createLabel('adapt') >>>> cs, ce = plex.getHeightStratum(0) >>>> for i in range(cs, ce): >>>> plex.setLabelValue('adapt', i, 2) >>>> >>>> return plex >>>> >>>> mesh = RectangleMesh(10, 10, 1, 1) >>>> >>>> x = SpatialCoordinate(mesh) >>>> V = FunctionSpace(mesh, 'CG', 1) >>>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>>> triplot(mesh) >>>> >>>> plex = mark_all_cells(mesh) >>>> new_plex = plex.adaptLabel('adapt') >>>> mesh = Mesh(new_plex) >>>> triplot(mesh) >>>> ``` >>>> >>>> Thank you very much for your time. >>>> >>>> Best wishes, >>>> Zongze >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Mon Feb 27 09:13:37 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Mon, 27 Feb 2023 16:13:37 +0100 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: <2F3DF02B-C772-4F72-ACAB-6F7D1F7655FD@joliv.et> > On 27 Feb 2023, at 3:59 PM, Matthew Knepley wrote: > > On Mon, Feb 27, 2023 at 9:53?AM Zongze Yang > wrote: >> Hi, Matt, >> >> I tested coarsening a mesh by using ParMMg without firedrake, and found some issues: >> see the code and results here: https://gitlab.com/petsc/petsc/-/issues/1331 >> >> Could you have a look and give some comments or suggestions? > > I replied on the issue. More generally, the adaptive refinement software has not seen wide use :) Matt probably meant ?the _DMPlex interface_ to adaptive refinement software has not seen wide use?, Mmg has been rather widely used for 10+ years (here is a 13-year old presentation https://www.ljll.math.upmc.fr/hecht/ftp/ff++days/2010/exposes/Morice-MeshMetric.pdf). Thanks, Pierre > , and I expect > more of these kinds of bugs until more people use it. > > Thanks, > > Matt > >> Best wishes, >> Zongze >> >> >> On Mon, 27 Feb 2023 at 20:19, Matthew Knepley > wrote: >>> On Sat, Feb 18, 2023 at 6:41?AM Zongze Yang > wrote: >>>> Another question on mesh coarsening is about `DMCoarsen` which will fail when running in parallel. >>>> >>>> I generate a mesh in Firedrake, and then create function space and functions, after that, I get the dmplex and coarsen it. >>>> When running in serials, I get the mesh coarsened correctly. But it failed with errors in ParMMG when running parallel. >>>> >>>> However, If I did not create function space and functions on the original mesh, everything works fine too. >>>> >>>> The code and the error logs are attached. >>> >>> I believe the problem is that Firedrake and PETSc currently have incompatible coordinate spaces. We are working >>> to fix this, and I expect it to work by this summer. >>> >>> Thanks, >>> >>> Matt >>> >>>> Thank you for your time and attention? >>>> >>>> Best wishes, >>>> Zongze >>>> >>>> >>>> On Sat, 18 Feb 2023 at 15:24, Zongze Yang > wrote: >>>>> Dear PETSc Group, >>>>> >>>>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>>>> I am trying to use it coarse a mesh, but the resulting mesh is refined. >>>>> >>>>> In the following code, all of the `adpat` label values were set to 2 (DM_ADAPT_COARSEN). >>>>> There must be something wrong. Could you give some suggestions? >>>>> >>>>> ```python >>>>> from firedrake import * >>>>> from firedrake.petsc import PETSc >>>>> >>>>> def mark_all_cells(mesh): >>>>> plex = mesh.topology_dm >>>>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>>>> plex.createLabel('adapt') >>>>> cs, ce = plex.getHeightStratum(0) >>>>> for i in range(cs, ce): >>>>> plex.setLabelValue('adapt', i, 2) >>>>> >>>>> return plex >>>>> >>>>> mesh = RectangleMesh(10, 10, 1, 1) >>>>> >>>>> x = SpatialCoordinate(mesh) >>>>> V = FunctionSpace(mesh, 'CG', 1) >>>>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>>>> triplot(mesh) >>>>> >>>>> plex = mark_all_cells(mesh) >>>>> new_plex = plex.adaptLabel('adapt') >>>>> mesh = Mesh(new_plex) >>>>> triplot(mesh) >>>>> ``` >>>>> >>>>> Thank you very much for your time. >>>>> >>>>> Best wishes, >>>>> Zongze >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 27 09:16:46 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Feb 2023 10:16:46 -0500 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: <2F3DF02B-C772-4F72-ACAB-6F7D1F7655FD@joliv.et> References: <2F3DF02B-C772-4F72-ACAB-6F7D1F7655FD@joliv.et> Message-ID: On Mon, Feb 27, 2023 at 10:13?AM Pierre Jolivet wrote: > On 27 Feb 2023, at 3:59 PM, Matthew Knepley wrote: > > On Mon, Feb 27, 2023 at 9:53?AM Zongze Yang wrote: > >> Hi, Matt, >> >> I tested coarsening a mesh by using ParMMg without firedrake, and found >> some issues: >> see the code and results here: >> https://gitlab.com/petsc/petsc/-/issues/1331 >> >> Could you have a look and give some comments or suggestions? >> > > I replied on the issue. More generally, the adaptive refinement software > has not seen wide use > > > :) > Matt probably meant ?the _DMPlex interface_ to adaptive refinement > software has not seen wide use?, Mmg has been rather widely used for 10+ > years (here is a 13-year old presentation > https://www.ljll.math.upmc.fr/hecht/ftp/ff++days/2010/exposes/Morice-MeshMetric.pdf > ). > The interface is certainly new, but even ParMMG is only from Nov 2016, which is very new if you are an old person :) Thanks, Matt > Thanks, > Pierre > > , and I expect > more of these kinds of bugs until more people use it. > > Thanks, > > Matt > > >> Best wishes, >> Zongze >> >> >> On Mon, 27 Feb 2023 at 20:19, Matthew Knepley wrote: >> >>> On Sat, Feb 18, 2023 at 6:41?AM Zongze Yang >>> wrote: >>> >>>> Another question on mesh coarsening is about `DMCoarsen` which will >>>> fail when running in parallel. >>>> >>>> I generate a mesh in Firedrake, and then create function space and >>>> functions, after that, I get the dmplex and coarsen it. >>>> When running in serials, I get the mesh coarsened correctly. But it >>>> failed with errors in ParMMG when running parallel. >>>> >>>> However, If I did not create function space and functions on the >>>> original mesh, everything works fine too. >>>> >>>> The code and the error logs are attached. >>>> >>> >>> I believe the problem is that Firedrake and PETSc currently have >>> incompatible coordinate spaces. We are working >>> to fix this, and I expect it to work by this summer. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thank you for your time and attention? >>>> >>>> Best wishes, >>>> Zongze >>>> >>>> >>>> On Sat, 18 Feb 2023 at 15:24, Zongze Yang wrote: >>>> >>>>> Dear PETSc Group, >>>>> >>>>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>>>> I am trying to use it coarse a mesh, but the resulting mesh is refined. >>>>> >>>>> In the following code, all of the `adpat` label values were set to 2 >>>>> (DM_ADAPT_COARSEN). >>>>> There must be something wrong. Could you give some suggestions? >>>>> >>>>> ```python >>>>> from firedrake import * >>>>> from firedrake.petsc import PETSc >>>>> >>>>> def mark_all_cells(mesh): >>>>> plex = mesh.topology_dm >>>>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>>>> plex.createLabel('adapt') >>>>> cs, ce = plex.getHeightStratum(0) >>>>> for i in range(cs, ce): >>>>> plex.setLabelValue('adapt', i, 2) >>>>> >>>>> return plex >>>>> >>>>> mesh = RectangleMesh(10, 10, 1, 1) >>>>> >>>>> x = SpatialCoordinate(mesh) >>>>> V = FunctionSpace(mesh, 'CG', 1) >>>>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>>>> triplot(mesh) >>>>> >>>>> plex = mark_all_cells(mesh) >>>>> new_plex = plex.adaptLabel('adapt') >>>>> mesh = Mesh(new_plex) >>>>> triplot(mesh) >>>>> ``` >>>>> >>>>> Thank you very much for your time. >>>>> >>>>> Best wishes, >>>>> Zongze >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Mon Feb 27 09:26:19 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Mon, 27 Feb 2023 16:26:19 +0100 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: <2F3DF02B-C772-4F72-ACAB-6F7D1F7655FD@joliv.et> Message-ID: > On 27 Feb 2023, at 4:16 PM, Matthew Knepley wrote: > > On Mon, Feb 27, 2023 at 10:13?AM Pierre Jolivet > wrote: >>> On 27 Feb 2023, at 3:59 PM, Matthew Knepley > wrote: >>> >>> On Mon, Feb 27, 2023 at 9:53?AM Zongze Yang > wrote: >>>> Hi, Matt, >>>> >>>> I tested coarsening a mesh by using ParMMg without firedrake, and found some issues: >>>> see the code and results here: https://gitlab.com/petsc/petsc/-/issues/1331 >>>> >>>> Could you have a look and give some comments or suggestions? >>> >>> I replied on the issue. More generally, the adaptive refinement software has not seen wide use >> >> :) >> Matt probably meant ?the _DMPlex interface_ to adaptive refinement software has not seen wide use?, Mmg has been rather widely used for 10+ years (here is a 13-year old presentation https://www.ljll.math.upmc.fr/hecht/ftp/ff++days/2010/exposes/Morice-MeshMetric.pdf). > > The interface is certainly new, but even ParMMG is only from Nov 2016, which is very new if you are an old person :) Indeed. In fact, I do believe we should add a DMPlex mechanism to centralize (redistribute on a single process) a DMPlex and to call Mmg instead of ParMmg. It would certainly not be scalable for large meshes but: 1) there is no need for ParMmg on small-/medium-scale meshes 2) Mmg is more robust than ParMmg at this point in time 3) Mmg has more feature than ParMmg at this point in time, e.g., implicit remeshing using a level-set 4) there is more industry money funnelled into Mmg than into ParMmg I think the mechanism I mentioned initially was in the TODO list of the Firedrake people (or yours?), maybe it?s already done, but in any case it?s not hooked in the Mmg adaptor code, though it should (erroring out in the case where the communicator is of size greater than one would then not happen anymore). Thanks, Pierre > Thanks, > > Matt > >> Thanks, >> Pierre >> >>> , and I expect >>> more of these kinds of bugs until more people use it. >>> >>> Thanks, >>> >>> Matt >>> >>>> Best wishes, >>>> Zongze >>>> >>>> >>>> On Mon, 27 Feb 2023 at 20:19, Matthew Knepley > wrote: >>>>> On Sat, Feb 18, 2023 at 6:41?AM Zongze Yang > wrote: >>>>>> Another question on mesh coarsening is about `DMCoarsen` which will fail when running in parallel. >>>>>> >>>>>> I generate a mesh in Firedrake, and then create function space and functions, after that, I get the dmplex and coarsen it. >>>>>> When running in serials, I get the mesh coarsened correctly. But it failed with errors in ParMMG when running parallel. >>>>>> >>>>>> However, If I did not create function space and functions on the original mesh, everything works fine too. >>>>>> >>>>>> The code and the error logs are attached. >>>>> >>>>> I believe the problem is that Firedrake and PETSc currently have incompatible coordinate spaces. We are working >>>>> to fix this, and I expect it to work by this summer. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> Thank you for your time and attention? >>>>>> >>>>>> Best wishes, >>>>>> Zongze >>>>>> >>>>>> >>>>>> On Sat, 18 Feb 2023 at 15:24, Zongze Yang > wrote: >>>>>>> Dear PETSc Group, >>>>>>> >>>>>>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>>>>>> I am trying to use it coarse a mesh, but the resulting mesh is refined. >>>>>>> >>>>>>> In the following code, all of the `adpat` label values were set to 2 (DM_ADAPT_COARSEN). >>>>>>> There must be something wrong. Could you give some suggestions? >>>>>>> >>>>>>> ```python >>>>>>> from firedrake import * >>>>>>> from firedrake.petsc import PETSc >>>>>>> >>>>>>> def mark_all_cells(mesh): >>>>>>> plex = mesh.topology_dm >>>>>>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>>>>>> plex.createLabel('adapt') >>>>>>> cs, ce = plex.getHeightStratum(0) >>>>>>> for i in range(cs, ce): >>>>>>> plex.setLabelValue('adapt', i, 2) >>>>>>> >>>>>>> return plex >>>>>>> >>>>>>> mesh = RectangleMesh(10, 10, 1, 1) >>>>>>> >>>>>>> x = SpatialCoordinate(mesh) >>>>>>> V = FunctionSpace(mesh, 'CG', 1) >>>>>>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>>>>>> triplot(mesh) >>>>>>> >>>>>>> plex = mark_all_cells(mesh) >>>>>>> new_plex = plex.adaptLabel('adapt') >>>>>>> mesh = Mesh(new_plex) >>>>>>> triplot(mesh) >>>>>>> ``` >>>>>>> >>>>>>> Thank you very much for your time. >>>>>>> >>>>>>> Best wishes, >>>>>>> Zongze >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Mon Feb 27 09:37:42 2023 From: yangzongze at gmail.com (Zongze Yang) Date: Mon, 27 Feb 2023 23:37:42 +0800 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: Message-ID: Yes, It seems that firedrake only works with DMPlex. Thanks. Best wishes, Zongze On Mon, 27 Feb 2023 at 22:53, Matthew Knepley wrote: > On Mon, Feb 27, 2023 at 9:45?AM Zongze Yang wrote: > >> Hi, Matt >> >> Thanks for your clarification. Can I change the type of DMPlex to >> DMForest? >> > > You can, however DMForest is for structured adaptive meshes using > quadtrees, and I do not believe > Firedrake works with that. > > Thanks, > > Matt > > >> Best wishes, >> Zongze >> >> >> On Mon, 27 Feb 2023 at 20:18, Matthew Knepley wrote: >> >>> On Sat, Feb 18, 2023 at 2:25?AM Zongze Yang >>> wrote: >>> >>>> Dear PETSc Group, >>>> >>>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>>> I am trying to use it coarse a mesh, but the resulting mesh is refined. >>>> >>>> In the following code, all of the `adpat` label values were set to 2 >>>> (DM_ADAPT_COARSEN). >>>> There must be something wrong. Could you give some suggestions? >>>> >>> >>> Sorry for the late reply. You are right, I need to put in error messages >>> for this. Here is what is happening. >>> PETSc tries to fallback if you do not have certain packages. In this >>> case, you are not using DMForest, >>> which responds to both coarsen and refine, so the >>> mesh generator interprets all markers as refine (they >>> cannot coarsen). I will add a check that fails on the coarsen marker. >>> >>> Coarsening is much more difficult in the presence of boundaries, which >>> is why it is not implemented in >>> most packages. For unstructured coarsening, I do not think there is any >>> choice but MMG. >>> >>> Thanks, >>> >>> Matt >>> >>> ```python >>>> from firedrake import * >>>> from firedrake.petsc import PETSc >>>> >>>> def mark_all_cells(mesh): >>>> plex = mesh.topology_dm >>>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>>> plex.createLabel('adapt') >>>> cs, ce = plex.getHeightStratum(0) >>>> for i in range(cs, ce): >>>> plex.setLabelValue('adapt', i, 2) >>>> >>>> return plex >>>> >>>> mesh = RectangleMesh(10, 10, 1, 1) >>>> >>>> x = SpatialCoordinate(mesh) >>>> V = FunctionSpace(mesh, 'CG', 1) >>>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>>> triplot(mesh) >>>> >>>> plex = mark_all_cells(mesh) >>>> new_plex = plex.adaptLabel('adapt') >>>> mesh = Mesh(new_plex) >>>> triplot(mesh) >>>> ``` >>>> >>>> Thank you very much for your time. >>>> >>>> Best wishes, >>>> Zongze >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 27 09:42:31 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 Feb 2023 10:42:31 -0500 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: <2F3DF02B-C772-4F72-ACAB-6F7D1F7655FD@joliv.et> Message-ID: On Mon, Feb 27, 2023 at 10:26?AM Pierre Jolivet wrote: > On 27 Feb 2023, at 4:16 PM, Matthew Knepley wrote: > > On Mon, Feb 27, 2023 at 10:13?AM Pierre Jolivet wrote: > >> On 27 Feb 2023, at 3:59 PM, Matthew Knepley wrote: >> >> On Mon, Feb 27, 2023 at 9:53?AM Zongze Yang wrote: >> >>> Hi, Matt, >>> >>> I tested coarsening a mesh by using ParMMg without firedrake, and found >>> some issues: >>> see the code and results here: >>> https://gitlab.com/petsc/petsc/-/issues/1331 >>> >>> Could you have a look and give some comments or suggestions? >>> >> >> I replied on the issue. More generally, the adaptive refinement software >> has not seen wide use >> >> >> :) >> Matt probably meant ?the _DMPlex interface_ to adaptive refinement >> software has not seen wide use?, Mmg has been rather widely used for 10+ >> years (here is a 13-year old presentation >> https://www.ljll.math.upmc.fr/hecht/ftp/ff++days/2010/exposes/Morice-MeshMetric.pdf >> ). >> > > The interface is certainly new, but even ParMMG is only from Nov 2016, > which is very new if you are an old person :) > > > Indeed. In fact, I do believe we should add a DMPlex mechanism to > centralize (redistribute on a single process) a DMPlex and to call Mmg > instead of ParMmg. > It would certainly not be scalable for large meshes but: > 1) there is no need for ParMmg on small-/medium-scale meshes > 2) Mmg is more robust than ParMmg at this point in time > 3) Mmg has more feature than ParMmg at this point in time, e.g., implicit > remeshing using a level-set > 4) there is more industry money funnelled into Mmg than into ParMmg > I think the mechanism I mentioned initially was in the TODO list of the > Firedrake people (or yours?), maybe it?s already done, but in any case it?s > not hooked in the Mmg adaptor code, though it should (erroring out in the > case where the communicator is of size greater than one would then not > happen anymore). > Yes, we used to do the same thing with partitioners. We can use DMPlexGather(). I thought MMG only did 2D and ParMMG only did 3D, but this must be wrong now. Can MMG do both? Thanks, Matt > Thanks, > Pierre > > Thanks, > > Matt > > >> Thanks, >> Pierre >> >> , and I expect >> more of these kinds of bugs until more people use it. >> >> Thanks, >> >> Matt >> >> >>> Best wishes, >>> Zongze >>> >>> >>> On Mon, 27 Feb 2023 at 20:19, Matthew Knepley wrote: >>> >>>> On Sat, Feb 18, 2023 at 6:41?AM Zongze Yang >>>> wrote: >>>> >>>>> Another question on mesh coarsening is about `DMCoarsen` which will >>>>> fail when running in parallel. >>>>> >>>>> I generate a mesh in Firedrake, and then create function space and >>>>> functions, after that, I get the dmplex and coarsen it. >>>>> When running in serials, I get the mesh coarsened correctly. But it >>>>> failed with errors in ParMMG when running parallel. >>>>> >>>>> However, If I did not create function space and functions on the >>>>> original mesh, everything works fine too. >>>>> >>>>> The code and the error logs are attached. >>>>> >>>> >>>> I believe the problem is that Firedrake and PETSc currently have >>>> incompatible coordinate spaces. We are working >>>> to fix this, and I expect it to work by this summer. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thank you for your time and attention? >>>>> >>>>> Best wishes, >>>>> Zongze >>>>> >>>>> >>>>> On Sat, 18 Feb 2023 at 15:24, Zongze Yang >>>>> wrote: >>>>> >>>>>> Dear PETSc Group, >>>>>> >>>>>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>>>>> I am trying to use it coarse a mesh, but the resulting mesh is >>>>>> refined. >>>>>> >>>>>> In the following code, all of the `adpat` label values were set to 2 >>>>>> (DM_ADAPT_COARSEN). >>>>>> There must be something wrong. Could you give some suggestions? >>>>>> >>>>>> ```python >>>>>> from firedrake import * >>>>>> from firedrake.petsc import PETSc >>>>>> >>>>>> def mark_all_cells(mesh): >>>>>> plex = mesh.topology_dm >>>>>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>>>>> plex.createLabel('adapt') >>>>>> cs, ce = plex.getHeightStratum(0) >>>>>> for i in range(cs, ce): >>>>>> plex.setLabelValue('adapt', i, 2) >>>>>> >>>>>> return plex >>>>>> >>>>>> mesh = RectangleMesh(10, 10, 1, 1) >>>>>> >>>>>> x = SpatialCoordinate(mesh) >>>>>> V = FunctionSpace(mesh, 'CG', 1) >>>>>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>>>>> triplot(mesh) >>>>>> >>>>>> plex = mark_all_cells(mesh) >>>>>> new_plex = plex.adaptLabel('adapt') >>>>>> mesh = Mesh(new_plex) >>>>>> triplot(mesh) >>>>>> ``` >>>>>> >>>>>> Thank you very much for your time. >>>>>> >>>>>> Best wishes, >>>>>> Zongze >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Mon Feb 27 09:44:49 2023 From: pierre at joliv.et (Pierre Jolivet) Date: Mon, 27 Feb 2023 16:44:49 +0100 Subject: [petsc-users] Inquiry regarding DMAdaptLabel function In-Reply-To: References: <2F3DF02B-C772-4F72-ACAB-6F7D1F7655FD@joliv.et> Message-ID: > On 27 Feb 2023, at 4:42 PM, Matthew Knepley wrote: > > On Mon, Feb 27, 2023 at 10:26?AM Pierre Jolivet > wrote: >>> On 27 Feb 2023, at 4:16 PM, Matthew Knepley > wrote: >>> >>> On Mon, Feb 27, 2023 at 10:13?AM Pierre Jolivet > wrote: >>>>> On 27 Feb 2023, at 3:59 PM, Matthew Knepley > wrote: >>>>> >>>>> On Mon, Feb 27, 2023 at 9:53?AM Zongze Yang > wrote: >>>>>> Hi, Matt, >>>>>> >>>>>> I tested coarsening a mesh by using ParMMg without firedrake, and found some issues: >>>>>> see the code and results here: https://gitlab.com/petsc/petsc/-/issues/1331 >>>>>> >>>>>> Could you have a look and give some comments or suggestions? >>>>> >>>>> I replied on the issue. More generally, the adaptive refinement software has not seen wide use >>>> >>>> :) >>>> Matt probably meant ?the _DMPlex interface_ to adaptive refinement software has not seen wide use?, Mmg has been rather widely used for 10+ years (here is a 13-year old presentation https://www.ljll.math.upmc.fr/hecht/ftp/ff++days/2010/exposes/Morice-MeshMetric.pdf). >>> >>> The interface is certainly new, but even ParMMG is only from Nov 2016, which is very new if you are an old person :) >> >> Indeed. In fact, I do believe we should add a DMPlex mechanism to centralize (redistribute on a single process) a DMPlex and to call Mmg instead of ParMmg. >> It would certainly not be scalable for large meshes but: >> 1) there is no need for ParMmg on small-/medium-scale meshes >> 2) Mmg is more robust than ParMmg at this point in time >> 3) Mmg has more feature than ParMmg at this point in time, e.g., implicit remeshing using a level-set >> 4) there is more industry money funnelled into Mmg than into ParMmg >> I think the mechanism I mentioned initially was in the TODO list of the Firedrake people (or yours?), maybe it?s already done, but in any case it?s not hooked in the Mmg adaptor code, though it should (erroring out in the case where the communicator is of size greater than one would then not happen anymore). > > Yes, we used to do the same thing with partitioners. We can use DMPlexGather(). > > I thought MMG only did 2D and ParMMG only did 3D, but this must be wrong now. Can MMG do both? Mmg does 2D, 3D, and 3D surfaces. ParMmg only does 3D (with no short-term plan for 2D or 3D surfaces). Thanks, Pierre > Thanks, > > Matt > >> Thanks, >> Pierre >> >>> Thanks, >>> >>> Matt >>> >>>> Thanks, >>>> Pierre >>>> >>>>> , and I expect >>>>> more of these kinds of bugs until more people use it. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>>> Best wishes, >>>>>> Zongze >>>>>> >>>>>> >>>>>> On Mon, 27 Feb 2023 at 20:19, Matthew Knepley > wrote: >>>>>>> On Sat, Feb 18, 2023 at 6:41?AM Zongze Yang > wrote: >>>>>>>> Another question on mesh coarsening is about `DMCoarsen` which will fail when running in parallel. >>>>>>>> >>>>>>>> I generate a mesh in Firedrake, and then create function space and functions, after that, I get the dmplex and coarsen it. >>>>>>>> When running in serials, I get the mesh coarsened correctly. But it failed with errors in ParMMG when running parallel. >>>>>>>> >>>>>>>> However, If I did not create function space and functions on the original mesh, everything works fine too. >>>>>>>> >>>>>>>> The code and the error logs are attached. >>>>>>> >>>>>>> I believe the problem is that Firedrake and PETSc currently have incompatible coordinate spaces. We are working >>>>>>> to fix this, and I expect it to work by this summer. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>>> Thank you for your time and attention? >>>>>>>> >>>>>>>> Best wishes, >>>>>>>> Zongze >>>>>>>> >>>>>>>> >>>>>>>> On Sat, 18 Feb 2023 at 15:24, Zongze Yang > wrote: >>>>>>>>> Dear PETSc Group, >>>>>>>>> >>>>>>>>> I am writing to inquire about the function DMAdaptLabel in PETSc. >>>>>>>>> I am trying to use it coarse a mesh, but the resulting mesh is refined. >>>>>>>>> >>>>>>>>> In the following code, all of the `adpat` label values were set to 2 (DM_ADAPT_COARSEN). >>>>>>>>> There must be something wrong. Could you give some suggestions? >>>>>>>>> >>>>>>>>> ```python >>>>>>>>> from firedrake import * >>>>>>>>> from firedrake.petsc import PETSc >>>>>>>>> >>>>>>>>> def mark_all_cells(mesh): >>>>>>>>> plex = mesh.topology_dm >>>>>>>>> with PETSc.Log.Event("ADD_ADAPT_LABEL"): >>>>>>>>> plex.createLabel('adapt') >>>>>>>>> cs, ce = plex.getHeightStratum(0) >>>>>>>>> for i in range(cs, ce): >>>>>>>>> plex.setLabelValue('adapt', i, 2) >>>>>>>>> >>>>>>>>> return plex >>>>>>>>> >>>>>>>>> mesh = RectangleMesh(10, 10, 1, 1) >>>>>>>>> >>>>>>>>> x = SpatialCoordinate(mesh) >>>>>>>>> V = FunctionSpace(mesh, 'CG', 1) >>>>>>>>> f = Function(V).interpolate(10 + 10*sin(x[0])) >>>>>>>>> triplot(mesh) >>>>>>>>> >>>>>>>>> plex = mark_all_cells(mesh) >>>>>>>>> new_plex = plex.adaptLabel('adapt') >>>>>>>>> mesh = Mesh(new_plex) >>>>>>>>> triplot(mesh) >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> Thank you very much for your time. >>>>>>>>> >>>>>>>>> Best wishes, >>>>>>>>> Zongze >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.grosse-bley at ziti.uni-heidelberg.de Mon Feb 27 11:08:49 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Mon, 27 Feb 2023 18:08:49 +0100 Subject: [petsc-users] =?utf-8?q?How_to_use_DM=5FBOUNDARY=5FGHOSTED_for_D?= =?utf-8?q?irichlet_boundary_conditions?= Message-ID: <2cd08e-63fce380-17-7881bd00@74555364> Hi, I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED instead of using DM_BOUNDARY_NONE and explicitly implementing the boundary by adding diagnonal-only rows. My assumption was that with?DM_BOUNDARY_GHOSTED all vectors from that DM have the extra memory for the ghost entries and that I can basically use DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at the implementation of DMDAVecGetArray it looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray passes the ghost corners to VecGetArray3d which then adds the right offsets. I could not find any example using DM_BOUNDARY_GHOSTED and then actually accessing the ghost/boundary elements. Can I assume that they are set to zero for the solution vector, i.e. the u=0 on \del\Omega and I do not need to access them at all? Best, Paul Gro?e-Bley -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 27 11:17:55 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 27 Feb 2023 12:17:55 -0500 Subject: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions In-Reply-To: <2cd08e-63fce380-17-7881bd00@74555364> References: <2cd08e-63fce380-17-7881bd00@74555364> Message-ID: <470FC448-01E4-4EF2-A88A-0C8EA78DF4EE@petsc.dev> Paul, DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local vectors (obtained with DMCreateLocalVector() but they will not appear in the global vectors obtained with DMCreateGlobalVector(); perhaps this is the issue? Since they do not appear in the global vector they will not appear in the linear system so there will be no diagonal entries for you to set since those rows/columns do not exist in the linear system. In other words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems than linear systems. Barry > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley wrote: > > Hi, > > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED instead of using DM_BOUNDARY_NONE and explicitly implementing the boundary by adding diagnonal-only rows. > > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have the extra memory for the ghost entries and that I can basically use DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at the implementation of DMDAVecGetArray it looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray passes the ghost corners to VecGetArray3d which then adds the right offsets. > > I could not find any example using DM_BOUNDARY_GHOSTED and then actually accessing the ghost/boundary elements. Can I assume that they are set to zero for the solution vector, i.e. the u=0 on \del\Omega and I do not need to access them at all? > > Best, > Paul Gro?e-Bley From paul.grosse-bley at ziti.uni-heidelberg.de Mon Feb 27 16:48:03 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Mon, 27 Feb 2023 23:48:03 +0100 Subject: [petsc-users] =?utf-8?q?How_to_use_DM=5FBOUNDARY=5FGHOSTED_for_D?= =?utf-8?q?irichlet_boundary_conditions?= In-Reply-To: <470FC448-01E4-4EF2-A88A-0C8EA78DF4EE@petsc.dev> Message-ID: <1ed2f8-63fd3300-217-5beb2a00@100785341> Hi Barry, the reason why I wanted to change to ghost boundaries is that I was worrying about the effect of PCMGs coarsening on these boundary values. As mentioned before, I am trying to reproduce results from the hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order instead of 4th etc.). I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on the boundary with rtol=1e-9. While my MG solver implemented in hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing smooths at the coarse level instead of CG). When I run the "same" MG solver built in PETSc on this problem, it starts out reducing the residual norm as fast or even faster for the first 20-30 iterations. But for the last order of magnitude in the residual norm it needs more than 300 V-cycles, i.e. it gets very slow. At this point I am pretty much out of ideas about what is the cause, especially since e.g. adding back cg at the coarsest level doesn't seem to change the number of iterations at all. Therefore I am suspecting the discretization to be the problem. HPGMG uses an even number of points per dimension (e.g. 256), while PCMG wants an odd number (e.g. 257). So I also tried adding another layer of boundary values for the discretization to effectively use only 254 points per dimension. This caused the solver to get even slightly worse. So can the explicit boundary values screw with the coarsening, especially when they are not finite? Because with the problem as stated in ex45 with finite (i.e. non-zero) boundary values, the MG solver takes only 18 V-cycles. Best, Paul On Monday, February 27, 2023 18:17 CET, Barry Smith wrote: ?Paul, DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local vectors (obtained with DMCreateLocalVector() but they will not appear in the global vectors obtained with DMCreateGlobalVector(); perhaps this is the issue? Since they do not appear in the global vector they will not appear in the linear system so there will be no diagonal entries for you to set since those rows/columns do not exist in the linear system. In other words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems than linear systems. Barry > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley wrote: > > Hi, > > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED instead of using DM_BOUNDARY_NONE and explicitly implementing the boundary by adding diagnonal-only rows. > > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have the extra memory for the ghost entries and that I can basically use DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at the implementation of DMDAVecGetArray it looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray passes the ghost corners to VecGetArray3d which then adds the right offsets. > > I could not find any example using DM_BOUNDARY_GHOSTED and then actually accessing the ghost/boundary elements. Can I assume that they are set to zero for the solution vector, i.e. the u=0 on \del\Omega and I do not need to access them at all? > > Best, > Paul Gro?e-Bley ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 27 17:23:18 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 27 Feb 2023 18:23:18 -0500 Subject: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions In-Reply-To: <1ed2f8-63fd3300-217-5beb2a00@100785341> References: <1ed2f8-63fd3300-217-5beb2a00@100785341> Message-ID: I have not seen explicitly including, or excluding, the Dirichlet boundary values in the system having a significant affect on the convergence so long as you SCALE the diagonal rows (of those Dirichlet points) by a value similar to the other entries along the diagonal. If they are scaled completely differently, that can screw up the convergence. For src/ksp/ksp/ex45.c I see that the appropriate scaling is used (note the scaling should come from a finite element view of the discretization even if the discretization is finite differences as is done in ex45.c) Are you willing to share the two codes so we can take a look with experienced eyes to try to figure out the difference? Barry > On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley wrote: > > Hi Barry, > > the reason why I wanted to change to ghost boundaries is that I was worrying about the effect of PCMGs coarsening on these boundary values. > > As mentioned before, I am trying to reproduce results from the hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order instead of 4th etc.). > I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on the boundary with rtol=1e-9. While my MG solver implemented in hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing smooths at the coarse level instead of CG). When I run the "same" MG solver built in PETSc on this problem, it starts out reducing the residual norm as fast or even faster for the first 20-30 iterations. But for the last order of magnitude in the residual norm it needs more than 300 V-cycles, i.e. it gets very slow. At this point I am pretty much out of ideas about what is the cause, especially since e.g. adding back cg at the coarsest level doesn't seem to change the number of iterations at all. Therefore I am suspecting the discretization to be the problem. HPGMG uses an even number of points per dimension (e.g. 256), while PCMG wants an odd number (e.g. 257). So I also tried adding another layer of boundary values for the discretization to effectively use only 254 points per dimension. This caused the solver to get even slightly worse. > > So can the explicit boundary values screw with the coarsening, especially when they are not finite? Because with the problem as stated in ex45 with finite (i.e. non-zero) boundary values, the MG solver takes only 18 V-cycles. > > Best, > Paul > > > > On Monday, February 27, 2023 18:17 CET, Barry Smith wrote: > >> >> Paul, >> >> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local vectors (obtained with DMCreateLocalVector() but they will not appear in the global vectors obtained with DMCreateGlobalVector(); perhaps this is the issue? Since they do not appear in the global vector they will not appear in the linear system so there will be no diagonal entries for you to set since those rows/columns do not exist in the linear system. In other words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems than linear systems. >> >> Barry >> >> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley wrote: >> > >> > Hi, >> > >> > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED instead of using DM_BOUNDARY_NONE and explicitly implementing the boundary by adding diagnonal-only rows. >> > >> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have the extra memory for the ghost entries and that I can basically use DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at the implementation of DMDAVecGetArray it looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray passes the ghost corners to VecGetArray3d which then adds the right offsets. >> > >> > I could not find any example using DM_BOUNDARY_GHOSTED and then actually accessing the ghost/boundary elements. Can I assume that they are set to zero for the solution vector, i.e. the u=0 on \del\Omega and I do not need to access them at all? >> > >> > Best, >> > Paul Gro?e-Bley >> From paul.grosse-bley at ziti.uni-heidelberg.de Mon Feb 27 18:16:03 2023 From: paul.grosse-bley at ziti.uni-heidelberg.de (Paul Grosse-Bley) Date: Tue, 28 Feb 2023 01:16:03 +0100 Subject: [petsc-users] =?utf-8?q?How_to_use_DM=5FBOUNDARY=5FGHOSTED_for_D?= =?utf-8?q?irichlet_boundary_conditions?= In-Reply-To: Message-ID: <2e5cd6-63fd4800-59-7e4e1780@61825700> The scaling might be the problem, especially since I don't know what you mean by scaling it according to FE. For reproducing the issue with a smaller problem: Change the ComputeRHS function in ex45.c if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) { ? barray[k][j][i] = 0.0; } else { ? barray[k][j][i] = 1.0; } Change the dimensions to e.g. 33 (I scaled it down, so it goes quick without a GPU) instead of 7 and then run with -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits -mg_coarse_pc_type jacobi You will find that it takes 145 iterations instead of 25 for the original ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41 iterations. To what do I have to change the diagonal entries of the matrix for the boundary according to FE? Right now the diagonal is completely constant. Paul On Tuesday, February 28, 2023 00:23 CET, Barry Smith wrote: ? I have not seen explicitly including, or excluding, the Dirichlet boundary values in the system having a significant affect on the convergence so long as you SCALE the diagonal rows (of those Dirichlet points) by a value similar to the other entries along the diagonal. If they are scaled completely differently, that can screw up the convergence. For src/ksp/ksp/ex45.c I see that the appropriate scaling is used (note the scaling should come from a finite element view of the discretization even if the discretization is finite differences as is done in ex45.c) Are you willing to share the two codes so we can take a look with experienced eyes to try to figure out the difference? Barry > On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley wrote: > > Hi Barry, > > the reason why I wanted to change to ghost boundaries is that I was worrying about the effect of PCMGs coarsening on these boundary values. > > As mentioned before, I am trying to reproduce results from the hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order instead of 4th etc.). > I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on the boundary with rtol=1e-9. While my MG solver implemented in hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing smooths at the coarse level instead of CG). When I run the "same" MG solver built in PETSc on this problem, it starts out reducing the residual norm as fast or even faster for the first 20-30 iterations. But for the last order of magnitude in the residual norm it needs more than 300 V-cycles, i.e. it gets very slow. At this point I am pretty much out of ideas about what is the cause, especially since e.g. adding back cg at the coarsest level doesn't seem to change the number of iterations at all. Therefore I am suspecting the discretization to be the problem. HPGMG uses an even number of points per dimension (e.g. 256), while PCMG wants an odd number (e.g. 257). So I also tried adding another layer of boundary values for the discretization to effectively use only 254 points per dimension. This caused the solver to get even slightly worse. > > So can the explicit boundary values screw with the coarsening, especially when they are not finite? Because with the problem as stated in ex45 with finite (i.e. non-zero) boundary values, the MG solver takes only 18 V-cycles. > > Best, > Paul > > > > On Monday, February 27, 2023 18:17 CET, Barry Smith wrote: > >> >> Paul, >> >> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local vectors (obtained with DMCreateLocalVector() but they will not appear in the global vectors obtained with DMCreateGlobalVector(); perhaps this is the issue? Since they do not appear in the global vector they will not appear in the linear system so there will be no diagonal entries for you to set since those rows/columns do not exist in the linear system. In other words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems than linear systems. >> >> Barry >> >> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley wrote: >> > >> > Hi, >> > >> > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED instead of using DM_BOUNDARY_NONE and explicitly implementing the boundary by adding diagnonal-only rows. >> > >> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have the extra memory for the ghost entries and that I can basically use DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at the implementation of DMDAVecGetArray it looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray passes the ghost corners to VecGetArray3d which then adds the right offsets. >> > >> > I could not find any example using DM_BOUNDARY_GHOSTED and then actually accessing the ghost/boundary elements. Can I assume that they are set to zero for the solution vector, i.e. the u=0 on \del\Omega and I do not need to access them at all? >> > >> > Best, >> > Paul Gro?e-Bley >> ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Feb 28 11:43:15 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 28 Feb 2023 12:43:15 -0500 Subject: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions In-Reply-To: <2e5cd6-63fd4800-59-7e4e1780@61825700> References: <2e5cd6-63fd4800-59-7e4e1780@61825700> Message-ID: I am sorry, I cannot reproduce what you describe. I am using src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as release for this purpose). No change to the code I get $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view 0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm 1.476491378857e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm 5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03 2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm 1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04 3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm 6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06 4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm 3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07 5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm 4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08 6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm 7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09 7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm 1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10 Linear solve converged due to CONVERGED_RTOL iterations 7 KSP Object: 1 MPI process type: richardson damping factor=1. maximum iterations=10000, nonzero initial guess tolerances: relative=1e-09, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI process type: mg type is MULTIPLICATIVE, levels=3 cycles=v Cycles per PCApply=1 Not using Galerkin computed coarse grid matrices Coarse grid solver -- level 0 ------------------------------- KSP Object: (mg_coarse_) 1 MPI process type: richardson damping factor=1. maximum iterations=6, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (mg_coarse_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=8, cols=8 total: nonzeros=32, allocated nonzeros=32 total number of mallocs used during MatSetValues calls=0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI process type: richardson damping factor=1. maximum iterations=6, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=64, cols=64 total: nonzeros=352, allocated nonzeros=352 total number of mallocs used during MatSetValues calls=0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 1 MPI process type: richardson damping factor=1. maximum iterations=6, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_2_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=343, cols=343 total: nonzeros=2107, allocated nonzeros=2107 total number of mallocs used during MatSetValues calls=0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=343, cols=343 total: nonzeros=2107, allocated nonzeros=2107 total number of mallocs used during MatSetValues calls=0 not using I-node routines Residual norm 1.33388e-08 ~/Src/petsc/src/ksp/ksp/tutorials (main=) arch-main $ Now change code with if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) { barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx); } else { barray[k][j][i] = 1; //Hx * Hy * Hz; } I do not understand where I am suppose to change the dimension to 33 so I ignore that statement. Same command line with change above gives $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view 0 KSP preconditioned resid norm 7.292257119299e+01 true resid norm 1.118033988750e+01 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.534913491362e+00 true resid norm 3.528425353826e-01 ||r(i)||/||b|| 3.155919577875e-02 2 KSP preconditioned resid norm 9.145057509152e-02 true resid norm 1.279725352471e-02 ||r(i)||/||b|| 1.144621152262e-03 3 KSP preconditioned resid norm 3.302446009474e-03 true resid norm 5.122622088691e-04 ||r(i)||/||b|| 4.581812485342e-05 4 KSP preconditioned resid norm 1.204504429329e-04 true resid norm 4.370692051248e-05 ||r(i)||/||b|| 3.909265814124e-06 5 KSP preconditioned resid norm 5.339971695523e-06 true resid norm 7.229991776815e-06 ||r(i)||/||b|| 6.466701235889e-07 6 KSP preconditioned resid norm 5.856425044706e-07 true resid norm 1.282860114273e-06 ||r(i)||/||b|| 1.147424968455e-07 7 KSP preconditioned resid norm 1.007137752126e-07 true resid norm 2.283009757390e-07 ||r(i)||/||b|| 2.041986004328e-08 8 KSP preconditioned resid norm 1.790021892548e-08 true resid norm 4.063263596129e-08 ||r(i)||/||b|| 3.634293444578e-09 Linear solve converged due to CONVERGED_RTOL iterations 8 KSP Object: 1 MPI process type: richardson damping factor=1. maximum iterations=10000, nonzero initial guess tolerances: relative=1e-09, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI process type: mg type is MULTIPLICATIVE, levels=3 cycles=v Cycles per PCApply=1 Not using Galerkin computed coarse grid matrices Coarse grid solver -- level 0 ------------------------------- KSP Object: (mg_coarse_) 1 MPI process type: richardson damping factor=1. maximum iterations=6, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (mg_coarse_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=8, cols=8 total: nonzeros=32, allocated nonzeros=32 total number of mallocs used during MatSetValues calls=0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI process type: richardson damping factor=1. maximum iterations=6, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=64, cols=64 total: nonzeros=352, allocated nonzeros=352 total number of mallocs used during MatSetValues calls=0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 1 MPI process type: richardson damping factor=1. maximum iterations=6, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_2_) 1 MPI process type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=343, cols=343 total: nonzeros=2107, allocated nonzeros=2107 total number of mallocs used during MatSetValues calls=0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=343, cols=343 total: nonzeros=2107, allocated nonzeros=2107 total number of mallocs used during MatSetValues calls=0 not using I-node routines Residual norm 4.06326e-08 ~/Src/petsc/src/ksp/ksp/tutorials (main *=) arch-main $ In neither case is it taking 25 iterations. What am I doing wrong? Normally one expects only trivial changes in the convergence of multigrid methods when one changes values in the right hand side as with the run above. Barry > On Feb 27, 2023, at 7:16 PM, Paul Grosse-Bley wrote: > > The scaling might be the problem, especially since I don't know what you mean by scaling it according to FE. > > For reproducing the issue with a smaller problem: > Change the ComputeRHS function in ex45.c > > if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) { > barray[k][j][i] = 0.0; > } else { > barray[k][j][i] = 1.0; > } > > Change the dimensions to e.g. 33 (I scaled it down, so it goes quick without a GPU) instead of 7 and then run with > > -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits -mg_coarse_pc_type jacobi > > You will find that it takes 145 iterations instead of 25 for the original ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41 iterations. > > To what do I have to change the diagonal entries of the matrix for the boundary according to FE? Right now the diagonal is completely constant. > > Paul > > On Tuesday, February 28, 2023 00:23 CET, Barry Smith wrote: > >> >> >> I have not seen explicitly including, or excluding, the Dirichlet boundary values in the system having a significant affect on the convergence so long as you SCALE the diagonal rows (of those Dirichlet points) by a value similar to the other entries along the diagonal. If they are scaled completely differently, that can screw up the convergence. For src/ksp/ksp/ex45.c I see that the appropriate scaling is used (note the scaling should come from a finite element view of the discretization even if the discretization is finite differences as is done in ex45.c) >> >> Are you willing to share the two codes so we can take a look with experienced eyes to try to figure out the difference? >> >> Barry >> >> >> >> >> > On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley wrote: >> > >> > Hi Barry, >> > >> > the reason why I wanted to change to ghost boundaries is that I was worrying about the effect of PCMGs coarsening on these boundary values. >> > >> > As mentioned before, I am trying to reproduce results from the hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order instead of 4th etc.). >> > I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on the boundary with rtol=1e-9. While my MG solver implemented in hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing smooths at the coarse level instead of CG). When I run the "same" MG solver built in PETSc on this problem, it starts out reducing the residual norm as fast or even faster for the first 20-30 iterations. But for the last order of magnitude in the residual norm it needs more than 300 V-cycles, i.e. it gets very slow. At this point I am pretty much out of ideas about what is the cause, especially since e.g. adding back cg at the coarsest level doesn't seem to change the number of iterations at all. Therefore I am suspecting the discretization to be the problem. HPGMG uses an even number of points per dimension (e.g. 256), while PCMG wants an odd number (e.g. 257). So I also tried adding another layer of boundary values for the discretization to effectively use only 254 points per dimension. This caused the solver to get even slightly worse. >> > >> > So can the explicit boundary values screw with the coarsening, especially when they are not finite? Because with the problem as stated in ex45 with finite (i.e. non-zero) boundary values, the MG solver takes only 18 V-cycles. >> > >> > Best, >> > Paul >> > >> > >> > >> > On Monday, February 27, 2023 18:17 CET, Barry Smith wrote: >> > >> >> >> >> Paul, >> >> >> >> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local vectors (obtained with DMCreateLocalVector() but they will not appear in the global vectors obtained with DMCreateGlobalVector(); perhaps this is the issue? Since they do not appear in the global vector they will not appear in the linear system so there will be no diagonal entries for you to set since those rows/columns do not exist in the linear system. In other words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems than linear systems. >> >> >> >> Barry >> >> >> >> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley wrote: >> >> > >> >> > Hi, >> >> > >> >> > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED instead of using DM_BOUNDARY_NONE and explicitly implementing the boundary by adding diagnonal-only rows. >> >> > >> >> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have the extra memory for the ghost entries and that I can basically use DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at the implementation of DMDAVecGetArray it looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray passes the ghost corners to VecGetArray3d which then adds the right offsets. >> >> > >> >> > I could not find any example using DM_BOUNDARY_GHOSTED and then actually accessing the ghost/boundary elements. Can I assume that they are set to zero for the solution vector, i.e. the u=0 on \del\Omega and I do not need to access them at all? >> >> > >> >> > Best, >> >> > Paul Gro?e-Bley >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From rz230 at uni-heidelberg.de Tue Feb 28 13:05:24 2023 From: rz230 at uni-heidelberg.de (=?UTF-8?Q?Gro=c3=9fe-Bley=2c_Paul?=) Date: Tue, 28 Feb 2023 20:05:24 +0100 Subject: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions In-Reply-To: References: <2e5cd6-63fd4800-59-7e4e1780@61825700> Message-ID: <3f20a04b-12fd-02d9-5189-8e390b1fa39b@uni-heidelberg.de> Sorry, I should have made myself more clear. I changed the three 7 passed to DMDACreate3d to 33 to make the example a bit more realistic, as I also use "U-cycles", i.e. my coarsest level is still big enough to make use of some GPU parallelism. I should have just put that into the given command line argument string with -da_grid_x 33 -da_grid_y 33 -da_grid_z 33 On 2/28/23 18:43, Barry Smith wrote: > > ? ?I am sorry, I cannot reproduce what you describe. I am using > src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as > release for this purpose). > > ? ?No change to the code I get > > $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 > -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson > -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits > -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson > -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits > -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view > > ? 0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm > 1.476491378857e+01 ||r(i)||/||b|| 1.000000000000e+00 > > ? 1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm > 5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03 > > ? 2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm > 1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04 > > ? 3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm > 6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06 > > ? 4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm > 3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07 > > ? 5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm > 4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08 > > ? 6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm > 7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09 > > ? 7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm > 1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10 > > Linear solve converged due to CONVERGED_RTOL iterations 7 > > KSP Object: 1 MPI process > > ? type: richardson > > damping factor=1. > > maximum iterations=10000, nonzero initial guess > > tolerances:? relative=1e-09, absolute=1e-50, divergence=10000. > > ? left preconditioning > > ? using PRECONDITIONED norm type for convergence test > > PC Object: 1 MPI process > > ? type: mg > > ? ? type is MULTIPLICATIVE, levels=3 cycles=v > > Cycles per PCApply=1 > > Not using Galerkin computed coarse grid matrices > > ? Coarse grid solver -- level 0 ------------------------------- > > ? ? KSP Object: (mg_coarse_) 1 MPI process > > type: richardson > > damping factor=1. > > maximum iterations=6, nonzero initial guess > > tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > ? ? PC Object: (mg_coarse_) 1 MPI process > > type: jacobi > > type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > type: seqaij > > rows=8, cols=8 > > total: nonzeros=32, allocated nonzeros=32 > > total number of mallocs used during MatSetValues calls=0 > > ? not using I-node routines > > ? Down solver (pre-smoother) on level 1 ------------------------------- > > ? ? KSP Object: (mg_levels_1_) 1 MPI process > > type: richardson > > damping factor=1. > > maximum iterations=6, nonzero initial guess > > tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > ? ? PC Object: (mg_levels_1_) 1 MPI process > > type: jacobi > > type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > type: seqaij > > rows=64, cols=64 > > total: nonzeros=352, allocated nonzeros=352 > > total number of mallocs used during MatSetValues calls=0 > > ? not using I-node routines > > ? Up solver (post-smoother) same as down solver (pre-smoother) > > ? Down solver (pre-smoother) on level 2 ------------------------------- > > ? ? KSP Object: (mg_levels_2_) 1 MPI process > > type: richardson > > damping factor=1. > > maximum iterations=6, nonzero initial guess > > tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > ? ? PC Object: (mg_levels_2_) 1 MPI process > > type: jacobi > > type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > type: seqaij > > rows=343, cols=343 > > total: nonzeros=2107, allocated nonzeros=2107 > > total number of mallocs used during MatSetValues calls=0 > > ? not using I-node routines > > ? Up solver (post-smoother) same as down solver (pre-smoother) > > ? linear system matrix = precond matrix: > > ? Mat Object: 1 MPI process > > type: seqaij > > rows=343, cols=343 > > total: nonzeros=2107, allocated nonzeros=2107 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > Residual norm 1.33388e-08 > > ~/Src/petsc/src/ksp/ksp/tutorials*(main=)*arch-main > > $ > > > > ? ?Now change code with > > ? ? ? ? if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 > || k == mz - 1) { > ? ? ? ? ? barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx); > ? ? ? ? } else { > ? ? ? ? ? barray[k][j][i] = 1; //Hx * Hy * Hz; > ? ? ? ? } > > ? I do not understand where I am suppose to change the dimension to 33 > so I ignore that statement. Same command line with change above gives > > $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 > -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson > -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits > -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson > -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits > -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view > > ? 0 KSP preconditioned resid norm 7.292257119299e+01 true resid norm > 1.118033988750e+01 ||r(i)||/||b|| 1.000000000000e+00 > > ? 1 KSP preconditioned resid norm 2.534913491362e+00 true resid norm > 3.528425353826e-01 ||r(i)||/||b|| 3.155919577875e-02 > > ? 2 KSP preconditioned resid norm 9.145057509152e-02 true resid norm > 1.279725352471e-02 ||r(i)||/||b|| 1.144621152262e-03 > > ? 3 KSP preconditioned resid norm 3.302446009474e-03 true resid norm > 5.122622088691e-04 ||r(i)||/||b|| 4.581812485342e-05 > > ? 4 KSP preconditioned resid norm 1.204504429329e-04 true resid norm > 4.370692051248e-05 ||r(i)||/||b|| 3.909265814124e-06 > > ? 5 KSP preconditioned resid norm 5.339971695523e-06 true resid norm > 7.229991776815e-06 ||r(i)||/||b|| 6.466701235889e-07 > > ? 6 KSP preconditioned resid norm 5.856425044706e-07 true resid norm > 1.282860114273e-06 ||r(i)||/||b|| 1.147424968455e-07 > > ? 7 KSP preconditioned resid norm 1.007137752126e-07 true resid norm > 2.283009757390e-07 ||r(i)||/||b|| 2.041986004328e-08 > > ? 8 KSP preconditioned resid norm 1.790021892548e-08 true resid norm > 4.063263596129e-08 ||r(i)||/||b|| 3.634293444578e-09 > > Linear solve converged due to CONVERGED_RTOL iterations 8 > > KSP Object: 1 MPI process > > type: richardson > > damping factor=1. > > maximum iterations=10000, nonzero initial guess > > tolerances:? relative=1e-09, absolute=1e-50, divergence=10000. > > ? left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 1 MPI process > > type: mg > > type is MULTIPLICATIVE, levels=3 cycles=v > > Cycles per PCApply=1 > > Not using Galerkin computed coarse grid matrices > > Coarse grid solver -- level 0 ------------------------------- > > KSP Object: (mg_coarse_) 1 MPI process > > type: richardson > > ? damping factor=1. > > maximum iterations=6, nonzero initial guess > > tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > ? ? PC Object: (mg_coarse_) 1 MPI process > > type: jacobi > > ? type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > ? type: seqaij > > ? rows=8, cols=8 > > ? total: nonzeros=32, allocated nonzeros=32 > > ? total number of mallocs used during MatSetValues calls=0 > > ? ? not using I-node routines > > ? Down solver (pre-smoother) on level 1 ------------------------------- > > KSP Object: (mg_levels_1_) 1 MPI process > > type: richardson > > ? damping factor=1. > > maximum iterations=6, nonzero initial guess > > tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > ? ? PC Object: (mg_levels_1_) 1 MPI process > > type: jacobi > > ? type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > ? type: seqaij > > ? rows=64, cols=64 > > ? total: nonzeros=352, allocated nonzeros=352 > > ? total number of mallocs used during MatSetValues calls=0 > > ? ? not using I-node routines > > ? Up solver (post-smoother) same as down solver (pre-smoother) > > ? Down solver (pre-smoother) on level 2 ------------------------------- > > KSP Object: (mg_levels_2_) 1 MPI process > > type: richardson > > ? damping factor=1. > > maximum iterations=6, nonzero initial guess > > tolerances:? relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > ? ? PC Object: (mg_levels_2_) 1 MPI process > > type: jacobi > > ? type DIAGONAL > > linear system matrix = precond matrix: > > Mat Object: 1 MPI process > > ? type: seqaij > > ? rows=343, cols=343 > > ? total: nonzeros=2107, allocated nonzeros=2107 > > ? total number of mallocs used during MatSetValues calls=0 > > ? ? not using I-node routines > > ? Up solver (post-smoother) same as down solver (pre-smoother) > > linear system matrix = precond matrix: > > ? Mat Object: 1 MPI process > > type: seqaij > > rows=343, cols=343 > > total: nonzeros=2107, allocated nonzeros=2107 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > Residual norm 4.06326e-08 > > ~/Src/petsc/src/ksp/ksp/tutorials*(main *=)*arch-main > > $ > > > In neither case is it taking 25 iterations. What am I doing wrong? > > > Normally one expects only trivial changes in the convergence of > multigrid methods when one changes values in the right hand side as > with the run above. > > > Barry > > > > >> On Feb 27, 2023, at 7:16 PM, Paul Grosse-Bley >> wrote: >> >> The scaling might be the problem, especially since I don't know what >> you mean by scaling it according to FE. >> >> For reproducing the issue with a smaller problem: >> Change the ComputeRHS function in ex45.c >> >> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == >> mz - 1) { >> ? barray[k][j][i] = 0.0; >> } else { >> ? barray[k][j][i] = 1.0; >> } >> >> Change the dimensions to e.g. 33 (I scaled it down, so it goes quick >> without a GPU) instead of 7 and then run with >> >> -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type >> mg -pc_mg_levels 3 -mg_levels_ksp_type richardson >> -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits >> -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson >> -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits >> -mg_coarse_pc_type jacobi >> >> You will find that it takes 145 iterations instead of 25 for the >> original ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41 >> iterations. >> >> To what do I have to change the diagonal entries of the matrix for >> the boundary according to FE? Right now the diagonal is completely >> constant. >> >> Paul >> >> On Tuesday, February 28, 2023 00:23 CET, Barry Smith >> wrote: >>> >>> I have not seen explicitly including, or excluding, the Dirichlet >>> boundary values in the system having a significant affect on the >>> convergence so long as you SCALE the diagonal rows (of those >>> Dirichlet points) by a value similar to the other entries along the >>> diagonal. If they are scaled completely differently, that can screw >>> up the convergence. For src/ksp/ksp/ex45.c I see that the >>> appropriate scaling is used (note the scaling should come from a >>> finite element view of the discretization even if the discretization >>> is finite differences as is done in ex45.c) >>> >>> Are you willing to share the two codes so we can take a look with >>> experienced eyes to try to figure out the difference? >>> >>> Barry >>> >>> >>> >>> >>> > On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley >>> wrote: >>> > >>> > Hi Barry, >>> > >>> > the reason why I wanted to change to ghost boundaries is that I >>> was worrying about the effect of PCMGs coarsening on these boundary >>> values. >>> > >>> > As mentioned before, I am trying to reproduce results from the >>> hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order >>> instead of 4th etc.). >>> > I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = >>> 0 on the boundary with rtol=1e-9. While my MG solver implemented in >>> hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing >>> smooths at the coarse level instead of CG). When I run the "same" MG >>> solver built in PETSc on this problem, it starts out reducing the >>> residual norm as fast or even faster for the first 20-30 iterations. >>> But for the last order of magnitude in the residual norm it needs >>> more than 300 V-cycles, i.e. it gets very slow. At this point I am >>> pretty much out of ideas about what is the cause, especially since >>> e.g. adding back cg at the coarsest level doesn't seem to change the >>> number of iterations at all. Therefore I am suspecting the >>> discretization to be the problem. HPGMG uses an even number of >>> points per dimension (e.g. 256), while PCMG wants an odd number >>> (e.g. 257). So I also tried adding another layer of boundary values >>> for the discretization to effectively use only 254 points per >>> dimension. This caused the solver to get even slightly worse. >>> > >>> > So can the explicit boundary values screw with the coarsening, >>> especially when they are not finite? Because with the problem as >>> stated in ex45 with finite (i.e. non-zero) boundary values, the MG >>> solver takes only 18 V-cycles. >>> > >>> > Best, >>> > Paul >>> > >>> > >>> > >>> > On Monday, February 27, 2023 18:17 CET, Barry Smith >>> wrote: >>> > >>> >> >>> >> Paul, >>> >> >>> >> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in >>> the local vectors (obtained with DMCreateLocalVector() but they will >>> not appear in the global vectors obtained with >>> DMCreateGlobalVector(); perhaps this is the issue? Since they do not >>> appear in the global vector they will not appear in the linear >>> system so there will be no diagonal entries for you to set since >>> those rows/columns do not exist in the linear system. In other >>> words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put >>> the Dirichlet values explicitly into the system being solved; >>> DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems >>> than linear systems. >>> >> >>> >> Barry >>> >> >>> >> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley >>> wrote: >>> >> > >>> >> > Hi, >>> >> > >>> >> > I would like to modify src/ksp/ksp/tutorials/ex45.c to >>> implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED >>> instead of using DM_BOUNDARY_NONE and explicitly implementing the >>> boundary by adding diagnonal-only rows. >>> >> > >>> >> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors >>> from that DM have the extra memory for the ghost entries and that I >>> can basically use DMDAGetGhostCorners instead of DMDAGetCorners to >>> access the array gotten via DMDAVecGetArray. But when I access (gxs, >>> gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at >>> the implementation of DMDAVecGetArray it looked to me as if >>> accessing (-1, -1, -1) should work as DMDAVecGetArray passes the >>> ghost corners to VecGetArray3d which then adds the right offsets. >>> >> > >>> >> > I could not find any example using DM_BOUNDARY_GHOSTED and then >>> actually accessing the ghost/boundary elements. Can I assume that >>> they are set to zero for the solution vector, i.e. the u=0 on >>> \del\Omega and I do not need to access them at all? >>> >> > >>> >> > Best, >>> >> > Paul Gro?e-Bley >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Feb 28 22:38:29 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 28 Feb 2023 23:38:29 -0500 Subject: [petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions In-Reply-To: <3f20a04b-12fd-02d9-5189-8e390b1fa39b@uni-heidelberg.de> References: <2e5cd6-63fd4800-59-7e4e1780@61825700> <3f20a04b-12fd-02d9-5189-8e390b1fa39b@uni-heidelberg.de> Message-ID: <257A736B-D47E-4D8C-94D5-CB2896B2A4E5@petsc.dev> Ok, here is the situation. The command line options as given do not result in multigrid quality convergence in any of the runs; the error contraction factor is around .94 (meaning that for the modes that the multigrid algorithm does the worst on it only removes about 6 percent of them per iteration). But this is hidden by the initial right hand side for the linear system as written in ex45.c which has O(h) values on the boundary nodes and O(h^3) values on the interior nodes. The first iterations are largely working on the boundary residual and making great progress attacking that so that it looks like the one has a good error contraction factor. One then sees the error contraction factor start to get worse and worse for the later iterations. With the 0 on the boundary the iterations quickly get to the bad regime where the error contraction factor is near one. One can see this by using a -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each iteration. Thought it appears the 0 boundary condition one converges much slower (since it requires many more iterations) if you factor out the huge advantage of the nonzero boundary condition case at the beginning (in terms of decreasing the residual) you see they both have an asymptotic error contraction factor of around .94 (which is horrible for multigrid). I now add -mg_levels_ksp_richardson_scale .9 -mg_coarse_ksp_richardson_scale .9 and rerun the two cases (nonzero and zero boundary right hand side) they take 35 and 41 iterations (much better) initial residual norm 14.6993 next residual norm 0.84167 0.0572591 next residual norm 0.0665392 0.00452668 next residual norm 0.0307273 0.00209039 next residual norm 0.0158949 0.00108134 next residual norm 0.00825189 0.000561378 next residual norm 0.00428474 0.000291492 next residual norm 0.00222482 0.000151355 next residual norm 0.00115522 7.85898e-05 next residual norm 0.000599836 4.0807e-05 next residual norm 0.000311459 2.11887e-05 next residual norm 0.000161722 1.1002e-05 next residual norm 8.39727e-05 5.71269e-06 next residual norm 4.3602e-05 2.96626e-06 next residual norm 2.26399e-05 1.5402e-06 next residual norm 1.17556e-05 7.99735e-07 next residual norm 6.10397e-06 4.15255e-07 next residual norm 3.16943e-06 2.15617e-07 next residual norm 1.64569e-06 1.11957e-07 next residual norm 8.54511e-07 5.81326e-08 next residual norm 4.43697e-07 3.01848e-08 next residual norm 2.30385e-07 1.56732e-08 next residual norm 1.19625e-07 8.13815e-09 next residual norm 6.21143e-08 4.22566e-09 next residual norm 3.22523e-08 2.19413e-09 next residual norm 1.67467e-08 1.13928e-09 next residual norm 8.69555e-09 5.91561e-10 next residual norm 4.51508e-09 3.07162e-10 next residual norm 2.34441e-09 1.59491e-10 next residual norm 1.21731e-09 8.28143e-11 next residual norm 6.32079e-10 4.30005e-11 next residual norm 3.28201e-10 2.23276e-11 next residual norm 1.70415e-10 1.15934e-11 next residual norm 8.84865e-11 6.01976e-12 next residual norm 4.59457e-11 3.1257e-12 next residual norm 2.38569e-11 1.62299e-12 next residual norm 1.23875e-11 8.42724e-13 Linear solve converged due to CONVERGED_RTOL iterations 35 Residual norm 1.23875e-11 initial residual norm 172.601 next residual norm 154.803 0.896887 next residual norm 66.9409 0.387837 next residual norm 34.4572 0.199636 next residual norm 17.8836 0.103612 next residual norm 9.28582 0.0537995 next residual norm 4.82161 0.027935 next residual norm 2.50358 0.014505 next residual norm 1.29996 0.0075316 next residual norm 0.674992 0.00391071 next residual norm 0.350483 0.0020306 next residual norm 0.181985 0.00105437 next residual norm 0.094494 0.000547472 next residual norm 0.0490651 0.000284269 next residual norm 0.0254766 0.000147604 next residual norm 0.0132285 7.6642e-05 next residual norm 0.00686876 3.97956e-05 next residual norm 0.00356654 2.06635e-05 next residual norm 0.00185189 1.07293e-05 next residual norm 0.000961576 5.5711e-06 next residual norm 0.000499289 2.89274e-06 next residual norm 0.000259251 1.50203e-06 next residual norm 0.000134614 7.79914e-07 next residual norm 6.98969e-05 4.04963e-07 next residual norm 3.62933e-05 2.10273e-07 next residual norm 1.88449e-05 1.09182e-07 next residual norm 9.78505e-06 5.66919e-08 next residual norm 5.0808e-06 2.94367e-08 next residual norm 2.63815e-06 1.52847e-08 next residual norm 1.36984e-06 7.93645e-09 next residual norm 7.11275e-07 4.12093e-09 next residual norm 3.69322e-07 2.13975e-09 next residual norm 1.91767e-07 1.11105e-09 next residual norm 9.95733e-08 5.769e-10 next residual norm 5.17024e-08 2.99549e-10 next residual norm 2.6846e-08 1.55538e-10 next residual norm 1.39395e-08 8.07615e-11 next residual norm 7.23798e-09 4.19348e-11 next residual norm 3.75824e-09 2.17742e-11 next residual norm 1.95138e-09 1.13058e-11 next residual norm 1.01327e-09 5.87059e-12 next residual norm 5.26184e-10 3.04856e-12 next residual norm 2.73182e-10 1.58274e-12 next residual norm 1.41806e-10 8.21586e-13 Linear solve converged due to CONVERGED_RTOL iterations 42 Residual norm 1.41806e-10 Notice in the first run the residual norm still dives much more quickly for the first 2 iterations than the second run. This is because the first run has "lucky error" that gets wiped out easily from the big boundary term. After that you can see that the convergence for both is very similar with both having a reasonable error contraction factor of .51 I' ve attached the modified src/ksp/pc/impls/mg/mg.c that prints the residuals along the way. ? One can also play games with the scaling factor used for the boundary nodes; I did a quick parameter sweep and found using 8 (instead of 2) in front of the diagonal matrix entry and the right hand side for the boundary nodes resulting in even better convergence; about half the number of iterations than the 35 and 41 above. So I do stand corrected; it is possible that using non-zero Dirichlet boundary conditions (compared to zero) can affect the "convergence" of the multigrid method (especially without a Krylov accelerator) in the NON-asymptotic regime when comparing the residual norms and using relative residual norm decrease as the convergence criteria. If you wish to compare, for example, ex45.c with a code that does not incorporate the Dirichlet boundary nodes in the linear system you can just use 0 boundary conditions for both codes. Barry > On Feb 28, 2023, at 2:05 PM, Gro?e-Bley, Paul wrote: > > Sorry, I should have made myself more clear. I changed the three 7 passed to DMDACreate3d to 33 to make the example a bit more realistic, as I also use "U-cycles", i.e. my coarsest level is still big enough to make use of some GPU parallelism. I should have just put that into the given command line argument string with -da_grid_x 33 -da_grid_y 33 -da_grid_z 33 > > On 2/28/23 18:43, Barry Smith wrote: >> >> I am sorry, I cannot reproduce what you describe. I am using src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as release for this purpose). >> >> No change to the code I get >> >> $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view >> 0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm 1.476491378857e+01 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm 5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03 >> 2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm 1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04 >> 3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm 6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06 >> 4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm 3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07 >> 5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm 4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08 >> 6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm 7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09 >> 7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm 1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10 >> Linear solve converged due to CONVERGED_RTOL iterations 7 >> KSP Object: 1 MPI process >> type: richardson >> damping factor=1. >> maximum iterations=10000, nonzero initial guess >> tolerances: relative=1e-09, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 1 MPI process >> type: mg >> type is MULTIPLICATIVE, levels=3 cycles=v >> Cycles per PCApply=1 >> Not using Galerkin computed coarse grid matrices >> Coarse grid solver -- level 0 ------------------------------- >> KSP Object: (mg_coarse_) 1 MPI process >> type: richardson >> damping factor=1. >> maximum iterations=6, nonzero initial guess >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: (mg_coarse_) 1 MPI process >> type: jacobi >> type DIAGONAL >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=8, cols=8 >> total: nonzeros=32, allocated nonzeros=32 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 1 MPI process >> type: richardson >> damping factor=1. >> maximum iterations=6, nonzero initial guess >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 1 MPI process >> type: jacobi >> type DIAGONAL >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=64, cols=64 >> total: nonzeros=352, allocated nonzeros=352 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 1 MPI process >> type: richardson >> damping factor=1. >> maximum iterations=6, nonzero initial guess >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 1 MPI process >> type: jacobi >> type DIAGONAL >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=343, cols=343 >> total: nonzeros=2107, allocated nonzeros=2107 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=343, cols=343 >> total: nonzeros=2107, allocated nonzeros=2107 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Residual norm 1.33388e-08 >> ~/Src/petsc/src/ksp/ksp/tutorials (main=) arch-main >> $ >> >> >> Now change code with >> >> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) { >> barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx); >> } else { >> barray[k][j][i] = 1; //Hx * Hy * Hz; >> } >> >> I do not understand where I am suppose to change the dimension to 33 so I ignore that statement. Same command line with change above gives >> >> $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view >> 0 KSP preconditioned resid norm 7.292257119299e+01 true resid norm 1.118033988750e+01 ||r(i)||/||b|| 1.000000000000e+00 >> 1 KSP preconditioned resid norm 2.534913491362e+00 true resid norm 3.528425353826e-01 ||r(i)||/||b|| 3.155919577875e-02 >> 2 KSP preconditioned resid norm 9.145057509152e-02 true resid norm 1.279725352471e-02 ||r(i)||/||b|| 1.144621152262e-03 >> 3 KSP preconditioned resid norm 3.302446009474e-03 true resid norm 5.122622088691e-04 ||r(i)||/||b|| 4.581812485342e-05 >> 4 KSP preconditioned resid norm 1.204504429329e-04 true resid norm 4.370692051248e-05 ||r(i)||/||b|| 3.909265814124e-06 >> 5 KSP preconditioned resid norm 5.339971695523e-06 true resid norm 7.229991776815e-06 ||r(i)||/||b|| 6.466701235889e-07 >> 6 KSP preconditioned resid norm 5.856425044706e-07 true resid norm 1.282860114273e-06 ||r(i)||/||b|| 1.147424968455e-07 >> 7 KSP preconditioned resid norm 1.007137752126e-07 true resid norm 2.283009757390e-07 ||r(i)||/||b|| 2.041986004328e-08 >> 8 KSP preconditioned resid norm 1.790021892548e-08 true resid norm 4.063263596129e-08 ||r(i)||/||b|| 3.634293444578e-09 >> Linear solve converged due to CONVERGED_RTOL iterations 8 >> KSP Object: 1 MPI process >> type: richardson >> damping factor=1. >> maximum iterations=10000, nonzero initial guess >> tolerances: relative=1e-09, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 1 MPI process >> type: mg >> type is MULTIPLICATIVE, levels=3 cycles=v >> Cycles per PCApply=1 >> Not using Galerkin computed coarse grid matrices >> Coarse grid solver -- level 0 ------------------------------- >> KSP Object: (mg_coarse_) 1 MPI process >> type: richardson >> damping factor=1. >> maximum iterations=6, nonzero initial guess >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: (mg_coarse_) 1 MPI process >> type: jacobi >> type DIAGONAL >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=8, cols=8 >> total: nonzeros=32, allocated nonzeros=32 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 ------------------------------- >> KSP Object: (mg_levels_1_) 1 MPI process >> type: richardson >> damping factor=1. >> maximum iterations=6, nonzero initial guess >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_levels_1_) 1 MPI process >> type: jacobi >> type DIAGONAL >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=64, cols=64 >> total: nonzeros=352, allocated nonzeros=352 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 ------------------------------- >> KSP Object: (mg_levels_2_) 1 MPI process >> type: richardson >> damping factor=1. >> maximum iterations=6, nonzero initial guess >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_levels_2_) 1 MPI process >> type: jacobi >> type DIAGONAL >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=343, cols=343 >> total: nonzeros=2107, allocated nonzeros=2107 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=343, cols=343 >> total: nonzeros=2107, allocated nonzeros=2107 >> total number of mallocs used during MatSetValues calls=0 >> not using I-node routines >> Residual norm 4.06326e-08 >> ~/Src/petsc/src/ksp/ksp/tutorials (main *=) arch-main >> $ >> >> In neither case is it taking 25 iterations. What am I doing wrong? >> >> Normally one expects only trivial changes in the convergence of multigrid methods when one changes values in the right hand side as with the run above. >> >> Barry >> >> >> >>> On Feb 27, 2023, at 7:16 PM, Paul Grosse-Bley wrote: >>> >>> The scaling might be the problem, especially since I don't know what you mean by scaling it according to FE. >>> >>> For reproducing the issue with a smaller problem: >>> Change the ComputeRHS function in ex45.c >>> >>> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1) { >>> barray[k][j][i] = 0.0; >>> } else { >>> barray[k][j][i] = 1.0; >>> } >>> >>> Change the dimensions to e.g. 33 (I scaled it down, so it goes quick without a GPU) instead of 7 and then run with >>> >>> -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits -mg_coarse_pc_type jacobi >>> >>> You will find that it takes 145 iterations instead of 25 for the original ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41 iterations. >>> >>> To what do I have to change the diagonal entries of the matrix for the boundary according to FE? Right now the diagonal is completely constant. >>> >>> Paul >>> >>> On Tuesday, February 28, 2023 00:23 CET, Barry Smith wrote: >>> >>>> >>>> >>>> I have not seen explicitly including, or excluding, the Dirichlet boundary values in the system having a significant affect on the convergence so long as you SCALE the diagonal rows (of those Dirichlet points) by a value similar to the other entries along the diagonal. If they are scaled completely differently, that can screw up the convergence. For src/ksp/ksp/ex45.c I see that the appropriate scaling is used (note the scaling should come from a finite element view of the discretization even if the discretization is finite differences as is done in ex45.c) >>>> >>>> Are you willing to share the two codes so we can take a look with experienced eyes to try to figure out the difference? >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> > On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley wrote: >>>> > >>>> > Hi Barry, >>>> > >>>> > the reason why I wanted to change to ghost boundaries is that I was worrying about the effect of PCMGs coarsening on these boundary values. >>>> > >>>> > As mentioned before, I am trying to reproduce results from the hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order instead of 4th etc.). >>>> > I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 0 on the boundary with rtol=1e-9. While my MG solver implemented in hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing smooths at the coarse level instead of CG). When I run the "same" MG solver built in PETSc on this problem, it starts out reducing the residual norm as fast or even faster for the first 20-30 iterations. But for the last order of magnitude in the residual norm it needs more than 300 V-cycles, i.e. it gets very slow. At this point I am pretty much out of ideas about what is the cause, especially since e.g. adding back cg at the coarsest level doesn't seem to change the number of iterations at all. Therefore I am suspecting the discretization to be the problem. HPGMG uses an even number of points per dimension (e.g. 256), while PCMG wants an odd number (e.g. 257). So I also tried adding another layer of boundary values for the discretization to effectively use only 254 points per dimension. This caused the solver to get even slightly worse. >>>> > >>>> > So can the explicit boundary values screw with the coarsening, especially when they are not finite? Because with the problem as stated in ex45 with finite (i.e. non-zero) boundary values, the MG solver takes only 18 V-cycles. >>>> > >>>> > Best, >>>> > Paul >>>> > >>>> > >>>> > >>>> > On Monday, February 27, 2023 18:17 CET, Barry Smith wrote: >>>> > >>>> >> >>>> >> Paul, >>>> >> >>>> >> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in the local vectors (obtained with DMCreateLocalVector() but they will not appear in the global vectors obtained with DMCreateGlobalVector(); perhaps this is the issue? Since they do not appear in the global vector they will not appear in the linear system so there will be no diagonal entries for you to set since those rows/columns do not exist in the linear system. In other words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put the Dirichlet values explicitly into the system being solved; DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems than linear systems. >>>> >> >>>> >> Barry >>>> >> >>>> >> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley wrote: >>>> >> > >>>> >> > Hi, >>>> >> > >>>> >> > I would like to modify src/ksp/ksp/tutorials/ex45.c to implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED instead of using DM_BOUNDARY_NONE and explicitly implementing the boundary by adding diagnonal-only rows. >>>> >> > >>>> >> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors from that DM have the extra memory for the ghost entries and that I can basically use DMDAGetGhostCorners instead of DMDAGetCorners to access the array gotten via DMDAVecGetArray. But when I access (gxs, gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at the implementation of DMDAVecGetArray it looked to me as if accessing (-1, -1, -1) should work as DMDAVecGetArray passes the ghost corners to VecGetArray3d which then adds the right offsets. >>>> >> > >>>> >> > I could not find any example using DM_BOUNDARY_GHOSTED and then actually accessing the ghost/boundary elements. Can I assume that they are set to zero for the solution vector, i.e. the u=0 on \del\Omega and I do not need to access them at all? >>>> >> > >>>> >> > Best, >>>> >> > Paul Gro?e-Bley >>>> >> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mg.c Type: application/octet-stream Size: 80299 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: