From Pierre.Seize at onera.fr Thu Oct 1 03:18:22 2020 From: Pierre.Seize at onera.fr (Pierre Seize) Date: Thu, 1 Oct 2020 10:18:22 +0200 Subject: [petsc-users] Memory violation in PetscFVLeastSquaresPseudoInverseSVD_Static In-Reply-To: <875z7vp0th.fsf@jedbrown.org> References: <875z7vp0th.fsf@jedbrown.org> Message-ID: <98d007d2-4721-c19f-9a20-3da62e6d602d@onera.fr> Sure I'll try, I was thinking of using ls->B as the A matrix for dgelss, ls->work as work and ls->Binv as B. The result is then stored in ls->Binv, but in column-major format. Right now, the column-major result is transposed in PetscFVLeastSquaresPseudoInverseSVD_Static, and the row-major result is copied in the output in PetscFVComputeGradient_LeastSquares. I think it's because PetscFVLeastSquaresPseudoInverse_Static gives the result in row-major format. Would it be alright if I changed PetscFVLeastSquaresPseudoInverseSVD_Static so that the result would still be in column-major format ? I could include the result recopy in the if statement for example. Moreover, this would be to keep the compatibility with PetscFVLeastSquaresPseudoInverse_Static, but right now it is manually disabled (with useSVD = PETSC_TRUE), so I am worrying for nothing ? Pierre On 30/09/20 20:38, Jed Brown wrote: > Pierre Seize writes: > >> Hi, >> >> In PetscFVLeastSquaresPseudoInverseSVD_Static, there is >> ? Brhs = work; >> ? maxmn = PetscMax(m,n); >> ? for (j=0; j> ??? for (i=0; i> ? } >> where on the calling function, PetscFVComputeGradient_LeastSquares, we >> set the arguments m <= numFaces, n <= dim and work <= ls->work. The size >> of the work array is computed in PetscFVLeastSquaresSetMaxFaces_LS as: >> ? ls->maxFaces = maxFaces; >> ? m?????? = ls->maxFaces; >> ? n?????? = dim; >> ? nrhs??? = ls->maxFaces; >> ? minwork = 3*PetscMin(m,n) + PetscMax(2*PetscMin(m,n), >> PetscMax(PetscMax(m,n), nrhs)); /* required by LAPACK */ > It's totally buggy because this formula is for the argument to dgelss, but the array is being used for a different purpose (to place Brhs). > > WORK > > WORK is DOUBLE PRECISION array, dimension (MAX(1,LWORK)) > On exit, if INFO = 0, WORK(1) returns the optimal LWORK. > > LWORK > > LWORK is INTEGER > The dimension of the array WORK. LWORK >= 1, and also: > LWORK >= 3*min(M,N) + max( 2*min(M,N), max(M,N), NRHS ) > For good performance, LWORK should generally be larger. > > If LWORK = -1, then a workspace query is assumed; the routine > only calculates the optimal size of the WORK array, returns > this value as the first entry of the WORK array, and no error > message related to LWORK is issued by XERBLA. > > There should be a separate allocation for Brhs and the work argument should be passed through to dgelss. > > The current code passes > > tmpwork = Ainv; > > along to dgelss, but we don't know that it's the right size either. > > > Would you be willing to submit a merge request with your best attempt at fixing this. I can help review and we'll get it into the 3.14.1 release? > >> ? ls->workSize = 5*minwork; /* We can afford to be extra generous */ >> >> In my example, the used size (maxmn * maxmn) is 81, and the actual size >> (ls->workSize) is 75, and therefore valgrind complains. >> Is it because I am missing something, or is it a bug ? >> >> Thanks >> >> Pierre Seize From t.appel17 at imperial.ac.uk Thu Oct 1 03:30:38 2020 From: t.appel17 at imperial.ac.uk (Appel, Thibaut) Date: Thu, 1 Oct 2020 08:30:38 +0000 Subject: [petsc-users] reset release branch In-Reply-To: References: Message-ID: Is ?master? still considered stable? Thibaut > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 30 Sep 2020 09:14:41 -0500 (CDT) > From: Satish Balay > To: petsc-dev at mcs.anl.gov > Cc: petsc-users at mcs.anl.gov > Subject: [petsc-users] reset release branch > Message-ID: > Content-Type: text/plain; charset=US-ASCII > > All, > > I had to force fix the release branch due to a bad merge. > > If you've pulled on the release branch after the bad merge (before this fix) - and now have the commit 25cac2be9df307cc6f0df502d8399122c3a2b6a3 in it - i.e check with: > > git branch --contains 25cac2be9df307cc6f0df502d8399122c3a2b6a3 > > please do: > > git checkout master > git branch -D release > git fetch -p > git checkout release > > [Note: As the petsc-3.14 release announcement e-mail indicated - we switched from using 'maint' branch to 'release' branch or release fixes] > > Satish From knepley at gmail.com Thu Oct 1 06:37:16 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 1 Oct 2020 07:37:16 -0400 Subject: [petsc-users] reset release branch In-Reply-To: References: Message-ID: On Thu, Oct 1, 2020 at 4:30 AM Appel, Thibaut wrote: > Is ?master? still considered stable? > Yes. Note however that we are going to migrate that branch to the name 'main' after this release. Thanks, Matt > Thibaut > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Wed, 30 Sep 2020 09:14:41 -0500 (CDT) > > From: Satish Balay > > To: petsc-dev at mcs.anl.gov > > Cc: petsc-users at mcs.anl.gov > > Subject: [petsc-users] reset release branch > > Message-ID: > > Content-Type: text/plain; charset=US-ASCII > > > > All, > > > > I had to force fix the release branch due to a bad merge. > > > > If you've pulled on the release branch after the bad merge (before this > fix) - and now have the commit 25cac2be9df307cc6f0df502d8399122c3a2b6a3 in > it - i.e check with: > > > > git branch --contains 25cac2be9df307cc6f0df502d8399122c3a2b6a3 > > > > please do: > > > > git checkout master > > git branch -D release > > git fetch -p > > git checkout release > > > > [Note: As the petsc-3.14 release announcement e-mail indicated - we > switched from using 'maint' branch to 'release' branch or release fixes] > > > > Satish > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 1 06:43:34 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 1 Oct 2020 07:43:34 -0400 Subject: [petsc-users] Memory violation in PetscFVLeastSquaresPseudoInverseSVD_Static In-Reply-To: <98d007d2-4721-c19f-9a20-3da62e6d602d@onera.fr> References: <875z7vp0th.fsf@jedbrown.org> <98d007d2-4721-c19f-9a20-3da62e6d602d@onera.fr> Message-ID: On Thu, Oct 1, 2020 at 4:18 AM Pierre Seize wrote: > Sure I'll try, > > I was thinking of using ls->B as the A matrix for dgelss, ls->work as > work and ls->Binv as B. The result is then stored in ls->Binv, but in > column-major format. > > Right now, the column-major result is transposed in > PetscFVLeastSquaresPseudoInverseSVD_Static, and the row-major result is > copied in the output in PetscFVComputeGradient_LeastSquares. I think > it's because PetscFVLeastSquaresPseudoInverse_Static gives the result in > row-major format. > > Would it be alright if I changed > PetscFVLeastSquaresPseudoInverseSVD_Static so that the result would > still be in column-major format ? I could include the result recopy in > the if statement for example. > > Moreover, this would be to keep the compatibility with > PetscFVLeastSquaresPseudoInverse_Static, but right now it is manually > disabled (with useSVD = PETSC_TRUE), so I am worrying for nothing ? > As long as the layout is documented in the function manpage, I think that change is fine. Nothing else uses the code except the reconstruction right now. Thanks, Matt > Pierre > > > On 30/09/20 20:38, Jed Brown wrote: > > Pierre Seize writes: > > > >> Hi, > >> > >> In PetscFVLeastSquaresPseudoInverseSVD_Static, there is > >> Brhs = work; > >> maxmn = PetscMax(m,n); > >> for (j=0; j >> for (i=0; i >> } > >> where on the calling function, PetscFVComputeGradient_LeastSquares, we > >> set the arguments m <= numFaces, n <= dim and work <= ls->work. The size > >> of the work array is computed in PetscFVLeastSquaresSetMaxFaces_LS as: > >> ls->maxFaces = maxFaces; > >> m = ls->maxFaces; > >> n = dim; > >> nrhs = ls->maxFaces; > >> minwork = 3*PetscMin(m,n) + PetscMax(2*PetscMin(m,n), > >> PetscMax(PetscMax(m,n), nrhs)); /* required by LAPACK */ > > It's totally buggy because this formula is for the argument to dgelss, > but the array is being used for a different purpose (to place Brhs). > > > > WORK > > > > WORK is DOUBLE PRECISION array, dimension > (MAX(1,LWORK)) > > On exit, if INFO = 0, WORK(1) returns the optimal > LWORK. > > > > LWORK > > > > LWORK is INTEGER > > The dimension of the array WORK. LWORK >= 1, and > also: > > LWORK >= 3*min(M,N) + max( 2*min(M,N), max(M,N), > NRHS ) > > For good performance, LWORK should generally be > larger. > > > > If LWORK = -1, then a workspace query is assumed; > the routine > > only calculates the optimal size of the WORK > array, returns > > this value as the first entry of the WORK array, > and no error > > message related to LWORK is issued by XERBLA. > > > > There should be a separate allocation for Brhs and the work argument > should be passed through to dgelss. > > > > The current code passes > > > > tmpwork = Ainv; > > > > along to dgelss, but we don't know that it's the right size either. > > > > > > Would you be willing to submit a merge request with your best attempt at > fixing this. I can help review and we'll get it into the 3.14.1 release? > > > >> ls->workSize = 5*minwork; /* We can afford to be extra generous */ > >> > >> In my example, the used size (maxmn * maxmn) is 81, and the actual size > >> (ls->workSize) is 75, and therefore valgrind complains. > >> Is it because I am missing something, or is it a bug ? > >> > >> Thanks > >> > >> Pierre Seize > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Oct 1 10:02:34 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 1 Oct 2020 10:02:34 -0500 (CDT) Subject: [petsc-users] reset release branch In-Reply-To: References: Message-ID: On Thu, 1 Oct 2020, Matthew Knepley wrote: > On Thu, Oct 1, 2020 at 4:30 AM Appel, Thibaut > wrote: > > > Is ?master? still considered stable? > > > > Yes. Note however that we are going to migrate that branch to the name > 'main' after this release. I'm not sure 'stable' is the appropriate description here. Ok - checking 'man gitworkflows' >>>>>>> ? maint tracks the commits that should go into the next "maintenance release", i.e., update of the last released stable version; ? master tracks the commits that should go into the next release; ? next is intended as a testing branch for topics being tested for stability for master. <<<<<< Ok - stable-release and stable-development? BTW: the name 'main' is still under discussion. One issue is - its too close to the old 'maint' name. [Currently inclined to preserve the old maint* branches for some time - in case its part of workflow of prior releases of applications that can't be changed. This is one reason why bitbucket repo is still active]. Maybe once we switch over - we will not have sticky fingers with maint.. Alternative is 'develop' [but this I believe might suggest a different workflow than what we use: its the above - without next, with maint renamed as release, master renamed as ????] Satish From bsmith at petsc.dev Thu Oct 1 10:06:45 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 1 Oct 2020 10:06:45 -0500 Subject: [petsc-users] reset release branch In-Reply-To: References: Message-ID: <4FE7AD86-0D5D-4CB5-8743-CAEDF351B9D9@petsc.dev> Thibaut, master has not changed in any way, it's usage is the same as before the release. Only the maint branch has been renamed to release. Barry > On Oct 1, 2020, at 3:30 AM, Appel, Thibaut wrote: > > Is ?master? still considered stable? > > Thibaut > >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 30 Sep 2020 09:14:41 -0500 (CDT) >> From: Satish Balay >> To: petsc-dev at mcs.anl.gov >> Cc: petsc-users at mcs.anl.gov >> Subject: [petsc-users] reset release branch >> Message-ID: >> Content-Type: text/plain; charset=US-ASCII >> >> All, >> >> I had to force fix the release branch due to a bad merge. >> >> If you've pulled on the release branch after the bad merge (before this fix) - and now have the commit 25cac2be9df307cc6f0df502d8399122c3a2b6a3 in it - i.e check with: >> >> git branch --contains 25cac2be9df307cc6f0df502d8399122c3a2b6a3 >> >> please do: >> >> git checkout master >> git branch -D release >> git fetch -p >> git checkout release >> >> [Note: As the petsc-3.14 release announcement e-mail indicated - we switched from using 'maint' branch to 'release' branch or release fixes] >> >> Satish > From olivier.jamond at cea.fr Thu Oct 1 12:31:23 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Thu, 1 Oct 2020 19:31:23 +0200 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices Message-ID: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> Dear all, I am working on a finite-elements/finite-volumes code, whose distributed solver is based on petsc. For FE, it relies on Lagrange multipliers for the imposition of various boundary conditions or interactions (simple dirichlet, contact, ...). This results in saddle point problems: [S Ct][x]=[f] [C 0 ][y] [g] As discussed in this mailing list ("Saddle point problem with nested matrix and a relatively small number of Lagrange multipliers"), the fieldsplit/PC_COMPOSITE_SCHUR approach involves (2 + 'number of iterations of the KSP for the Schur complement') KSPSolve(S, Sp). I would like to try the formula given by Ainsworth in [1] to solve this problem: x = (Sp)^(-1) * fp y = Rt * (f - S*x) where: Sp= Ct*C?+ Qt*S*Q Q = I - P P = R * C R =?Ct * (C*Ct)^(-1) My input matrices (S and C) are MPIAIJ matrices. I create a shell matrix for Sp (because it involves?(C*Ct)^(-1) so I think it may be a bad idea to compute it explicitly...) with the MatMult?operator to use it in a KSPSolve. The C matrix and g vector are scaled so that the condition number of Sp is similar to the one of S. It works, but my main problem is that because Sp is a shell matrix, as far as I understand, I deprive myself of all the petsc preconditioners... I tried to use S as a preconditioning?matrix, but it's not good: With a GAMG?preconditioner, my iteration number is about 4 times higher than?in a "debug" version where I compute Sp explicitly as a MPIAIJ matrix and use it as preconditioning?matrix. Is there a way to use the petsc preconditioners for?shell matrices or at least to define a shell preconditioner that internally calls the petsc preconditioners? In the end I would like to have something like GAMG(Ct*C?+ Qt*S*Q) as a preconditioner?(here Q is a shell matrix), or something like?Qt*GAMG(S)*Q (which from matlab experimentation could be a good?preconditioner). Many thanks, Olivier [1]: Ainsworth, M. (2001). Essential boundary conditions and multi-point constraints in finite element analysis.?Computer Methods in Applied Mechanics and Engineering,?190(48), 6323-6339. From jed at jedbrown.org Thu Oct 1 13:47:10 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 01 Oct 2020 12:47:10 -0600 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> Message-ID: <87mu15u6kx.fsf@jedbrown.org> Olivier Jamond writes: > Dear all, > > I am working on a finite-elements/finite-volumes code, whose distributed > solver is based on petsc. For FE, it relies on Lagrange multipliers for > the imposition of various boundary conditions or interactions (simple > dirichlet, contact, ...). This results in saddle point problems: > > [S Ct][x]=[f] > [C 0 ][y] [g] > > As discussed in this mailing list ("Saddle point problem with nested > matrix and a relatively small number of Lagrange multipliers"), the > fieldsplit/PC_COMPOSITE_SCHUR approach involves (2 + 'number of > iterations of the KSP for the Schur complement') KSPSolve(S, Sp). I > would like to try the formula given by Ainsworth in [1] to solve this > problem: > > x = (Sp)^(-1) * fp > y = Rt * (f - S*x) > > where: > Sp= Ct*C?+ Qt*S*Q I just want to observe here that Ct*C lives in the big space and is low rank. It's kinda like what you would get from an augmented Lagrangian approach. The second term involves these commutators that destroy sparsity in general, but the context of the paper (as I interpreted it in a quick skim) is such that C*Ct consists of small decoupled blocks associated with each MPC. The suggestion is that these can either be computed explicitly (possibly at the element level) or cleaned up in a small number of Krylov iterations. > Q = I - P > P = R * C > R =?Ct * (C*Ct)^(-1) > > My input matrices (S and C) are MPIAIJ matrices. I create a shell matrix > for Sp (because it involves?(C*Ct)^(-1) so I think it may be a bad idea > to compute it explicitly...) with the MatMult?operator to use it in a > KSPSolve. The C matrix and g vector are scaled so that the condition > number of Sp is similar to the one of S. > > It works, but my main problem is that because Sp is a shell matrix, as > far as I understand, I deprive myself of all the petsc > preconditioners... I tried to use S as a preconditioning?matrix, but > it's not good: With a GAMG?preconditioner, my iteration number is about > 4 times higher than?in a "debug" version where I compute Sp explicitly > as a MPIAIJ matrix and use it as preconditioning?matrix. Are your coupling constraints nonlocal, such that C*Ct is not block diagonal? > Is there a way to use the petsc preconditioners for?shell matrices or at > least to define a shell preconditioner that internally calls the petsc > preconditioners? > > In the end I would like to have something like GAMG(Ct*C?+ Qt*S*Q) as a > preconditioner?(here Q is a shell matrix), or something > like?Qt*GAMG(S)*Q (which from matlab experimentation could be a > good?preconditioner). > > Many thanks, > Olivier > > [1]: Ainsworth, M. (2001). Essential boundary conditions and multi-point > constraints in finite element analysis.?Computer Methods in Applied > Mechanics and Engineering,?190(48), 6323-6339. From olivier.jamond at cea.fr Fri Oct 2 06:50:46 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Fri, 2 Oct 2020 13:50:46 +0200 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <87mu15u6kx.fsf@jedbrown.org> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> Message-ID: <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> Dear Jed, The code I am working on is quite generic and at the solve step, the matrix C can be 'whatever' (but is supposed to be full rank). But in practice, in 99% of the cases, C contain MPCs that refers to boundary conditions applied to subsets of the mesh boundary. These MPCs can couple several dofs, and a given dofs can be involved in several MPCs. For example, one could impose that the average of the solution in the x-direction is null on a part of the boundary, and that this part of the boundary is in contact with another part of the boundary. So yes, CCt is block diagonal, where each block is a set of MPCs that share dofs, and CtC is also block diagonal, where each block is a set of dofs that share MPCs. For the vast majority of cases, these blocks involve dofs/MPCs attached to a subset of the boundary, so they are small with respect to the total number of dofs (and their size grows slower than the total number of dofs when the mesh is refined). I am not sure to understand what you mean by compute the MPCs explicitly: do you mean eliminating them? For very simple dirichlet conditions I see how to do that, but in a more generic case I don't see (but there may be some techniques I don't know about!). I don't understand also what you mean by cleaning them in a small number of krylov iterations? Many thanks, Olivier On 01/10/2020 20:47, Jed Brown wrote: > Olivier Jamond writes: > >> Dear all, >> >> I am working on a finite-elements/finite-volumes code, whose distributed >> solver is based on petsc. For FE, it relies on Lagrange multipliers for >> the imposition of various boundary conditions or interactions (simple >> dirichlet, contact, ...). This results in saddle point problems: >> >> [S Ct][x]=[f] >> [C 0 ][y] [g] >> >> As discussed in this mailing list ("Saddle point problem with nested >> matrix and a relatively small number of Lagrange multipliers"), the >> fieldsplit/PC_COMPOSITE_SCHUR approach involves (2 + 'number of >> iterations of the KSP for the Schur complement') KSPSolve(S, Sp). I >> would like to try the formula given by Ainsworth in [1] to solve this >> problem: >> >> x = (Sp)^(-1) * fp >> y = Rt * (f - S*x) >> >> where: >> Sp= Ct*C?+ Qt*S*Q > I just want to observe here that Ct*C lives in the big space and is low rank. It's kinda like what you would get from an augmented Lagrangian approach. > > The second term involves these commutators that destroy sparsity in general, but the context of the paper (as I interpreted it in a quick skim) is such that C*Ct consists of small decoupled blocks associated with each MPC. The suggestion is that these can either be computed explicitly (possibly at the element level) or cleaned up in a small number of Krylov iterations. > >> Q = I - P >> P = R * C >> R =?Ct * (C*Ct)^(-1) >> >> My input matrices (S and C) are MPIAIJ matrices. I create a shell matrix >> for Sp (because it involves?(C*Ct)^(-1) so I think it may be a bad idea >> to compute it explicitly...) with the MatMult?operator to use it in a >> KSPSolve. The C matrix and g vector are scaled so that the condition >> number of Sp is similar to the one of S. >> >> It works, but my main problem is that because Sp is a shell matrix, as >> far as I understand, I deprive myself of all the petsc >> preconditioners... I tried to use S as a preconditioning?matrix, but >> it's not good: With a GAMG?preconditioner, my iteration number is about >> 4 times higher than?in a "debug" version where I compute Sp explicitly >> as a MPIAIJ matrix and use it as preconditioning?matrix. > Are your coupling constraints nonlocal, such that C*Ct is not block diagonal? > >> Is there a way to use the petsc preconditioners for?shell matrices or at >> least to define a shell preconditioner that internally calls the petsc >> preconditioners? >> >> In the end I would like to have something like GAMG(Ct*C?+ Qt*S*Q) as a >> preconditioner?(here Q is a shell matrix), or something >> like?Qt*GAMG(S)*Q (which from matlab experimentation could be a >> good?preconditioner). >> >> Many thanks, >> Olivier >> >> [1]: Ainsworth, M. (2001). Essential boundary conditions and multi-point >> constraints in finite element analysis.?Computer Methods in Applied >> Mechanics and Engineering,?190(48), 6323-6339. From bsmith at petsc.dev Fri Oct 2 17:23:57 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 2 Oct 2020 17:23:57 -0500 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> Message-ID: <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> > On Oct 2, 2020, at 6:50 AM, Olivier Jamond wrote: > > Dear Jed, > > The code I am working on is quite generic and at the solve step, the matrix C can be 'whatever' (but is supposed to be full rank). But in practice, in 99% of the cases, C contain MPCs that refers to boundary conditions applied to subsets of the mesh boundary. These MPCs can couple several dofs, and a given dofs can be involved in several MPCs. For example, one could impose that the average of the solution in the x-direction is null on a part of the boundary, and that this part of the boundary is in contact with another part of the boundary. > > So yes, CCt is block diagonal, where each block is a set of MPCs that share dofs, and CtC is also block diagonal, where each block is a set of dofs that share MPCs. For the vast majority of cases, these blocks involve dofs/MPCs attached to a subset of the boundary, so they are small with respect to the total number of dofs (and their size grows slower than the total number of dofs when the mesh is refined). > > I am not sure to understand what you mean by compute the MPCs explicitly: do you mean eliminating them? For very simple dirichlet conditions I see how to do that, but in a more generic case I don't see (but there may be some techniques I don't know about!). > > I don't understand also what you mean by cleaning them in a small number of krylov iterations? I think what Jed is saying is that you should just actually build your preconditioner for your Ct*C + Qt*S*Q operator with S. Because Ct is tall and skinny the eigenstructure of Ct*C + Qt*S*Q is just the eigenstructure of S with a low rank "modification" and Krylov methods (GMRES) are good at solving problems where the eigenstructure of the preconditioner is only a small rank modification of the eigenstructure of the operator you are supply to GMRES. In the best situation each new iteration of GMRES corrects one more of the "rogue" eigen directions. I would first use a direct solver with S just to test how well it works as a preconditioner and then switch to GAMG or whatever should work efficiently for solving your particular S matrix. I'd be interested in hearing how well the Ainsworth Formula works, it is something that might be worth adding to PCFIELDSPLIT. Barry > > Many thanks, > Olivier > > On 01/10/2020 20:47, Jed Brown wrote: >> Olivier Jamond writes: >> >>> Dear all, >>> >>> I am working on a finite-elements/finite-volumes code, whose distributed >>> solver is based on petsc. For FE, it relies on Lagrange multipliers for >>> the imposition of various boundary conditions or interactions (simple >>> dirichlet, contact, ...). This results in saddle point problems: >>> >>> [S Ct][x]=[f] >>> [C 0 ][y] [g] >>> >>> As discussed in this mailing list ("Saddle point problem with nested >>> matrix and a relatively small number of Lagrange multipliers"), the >>> fieldsplit/PC_COMPOSITE_SCHUR approach involves (2 + 'number of >>> iterations of the KSP for the Schur complement') KSPSolve(S, Sp). I >>> would like to try the formula given by Ainsworth in [1] to solve this >>> problem: >>> >>> x = (Sp)^(-1) * fp >>> y = Rt * (f - S*x) >>> >>> where: >>> Sp= Ct*C + Qt*S*Q >> I just want to observe here that Ct*C lives in the big space and is low rank. It's kinda like what you would get from an augmented Lagrangian approach. >> >> The second term involves these commutators that destroy sparsity in general, but the context of the paper (as I interpreted it in a quick skim) is such that C*Ct consists of small decoupled blocks associated with each MPC. The suggestion is that these can either be computed explicitly (possibly at the element level) or cleaned up in a small number of Krylov iterations. >> >>> Q = I - P >>> P = R * C >>> R = Ct * (C*Ct)^(-1) >>> >>> My input matrices (S and C) are MPIAIJ matrices. I create a shell matrix >>> for Sp (because it involves (C*Ct)^(-1) so I think it may be a bad idea >>> to compute it explicitly...) with the MatMult operator to use it in a >>> KSPSolve. The C matrix and g vector are scaled so that the condition >>> number of Sp is similar to the one of S. >>> >>> It works, but my main problem is that because Sp is a shell matrix, as >>> far as I understand, I deprive myself of all the petsc >>> preconditioners... I tried to use S as a preconditioning matrix, but >>> it's not good: With a GAMG preconditioner, my iteration number is about >>> 4 times higher than in a "debug" version where I compute Sp explicitly >>> as a MPIAIJ matrix and use it as preconditioning matrix. >> Are your coupling constraints nonlocal, such that C*Ct is not block diagonal? >> >>> Is there a way to use the petsc preconditioners for shell matrices or at >>> least to define a shell preconditioner that internally calls the petsc >>> preconditioners? >>> >>> In the end I would like to have something like GAMG(Ct*C + Qt*S*Q) as a >>> preconditioner (here Q is a shell matrix), or something >>> like Qt*GAMG(S)*Q (which from matlab experimentation could be a >>> good preconditioner). >>> >>> Many thanks, >>> Olivier >>> >>> [1]: Ainsworth, M. (2001). Essential boundary conditions and multi-point >>> constraints in finite element analysis. Computer Methods in Applied >>> Mechanics and Engineering, 190(48), 6323-6339. From ashish.patel at onscale.com Fri Oct 2 18:15:57 2020 From: ashish.patel at onscale.com (Ashish Patel) Date: Fri, 2 Oct 2020 16:15:57 -0700 Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM Message-ID: Dear PETSc users, I am trying to assemble a matrix for a finite element problem where the degree of freedom (dof) on a surface is constrained via a reference node which also exists in the DM but is not connected with any other point in the mesh. To apply the constraint I want to be able to set matrix values in the rows belonging to dofs of reference nodes and columns belonging to dofs of surface nodes and vice versa. But since the two points are not connected topologically I cannot just use DMPlexMatSetClosure to do that. I am currently trying to use DMPlexAddConeSize on all the constrained surface points followed by a call to DMPlexInsertCone wherein I add the reference node to the cone of surface point before setting up the PetscSection. Is this the right approach? I am currently getting following message New nonzero at (8952,28311) caused a malloc Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check I could set the suggested option to get rid of the error but was wondering if I am missing something. Thanks Ashish -------------- next part -------------- An HTML attachment was scrubbed... URL: From Zane.Jakobs at colorado.edu Sat Oct 3 10:11:21 2020 From: Zane.Jakobs at colorado.edu (Zane Charles Jakobs) Date: Sat, 3 Oct 2020 08:11:21 -0700 Subject: [petsc-users] Debug build fails Message-ID: Hi PETSc devs, I just pulled the latest version of PETSc on master, and while my optimized build works fine, my debug build fails with the message make[1]: *** No rule to make target 'src/sys/logging/examples/makefile'. Stop. make: *** [GNUmakefile:17: src/sys/logging/examples/makefile] Error 2 Doing ls src/sys/logging/examples shows a file named `index.html` and a directory named `tutorials`, but no makefile. My configure line is ./configure PETSC_ARCH=arch-linux-c-debug --with-cc=clang --with-cxx=clang++ COPTFLAGS="-O3 -march=native -mtune=native -fPIC" CXXOPTFLAGS="-O3 -march=native -mtune=native -fPIC" FOPTFLAGS="-O3 -march=native -mtune=native -fPIC" --with-avx2=1 --download-mpich --download-hypre --download-scalapack --download-mumps --with-debugging=yes --with-blaslapack-dir=/opt/intel/mkl --download-zlib --download-libpng --download-giflib --download-libjpeg --download-slepc --download-eigen To reiterate, doing the exact same configure, but changing '-with-debugging=yes' to '-with-debugging=no' (and changing the PETSC_ARCH name to 'arch-linux-c-debug') and then building the non-debugging version of PETSc works as normal. Any ideas what could be going on? Thanks! -Zane Jakobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Oct 3 10:27:51 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 3 Oct 2020 11:27:51 -0400 Subject: [petsc-users] Debug build fails In-Reply-To: References: Message-ID: On Sat, Oct 3, 2020 at 11:11 AM Zane Charles Jakobs < Zane.Jakobs at colorado.edu> wrote: > Hi PETSc devs, > > I just pulled the latest version of PETSc on master, and while my > optimized build works fine, my debug build fails with the message > > make[1]: *** No rule to make target 'src/sys/logging/examples/makefile'. > Stop. > make: *** [GNUmakefile:17: src/sys/logging/examples/makefile] Error 2 > > Doing > ls src/sys/logging/examples > Remove this directory. It is not in the repository, and its presence is confusing the automatic detection for the build. Thanks, Matt > shows a file named `index.html` and a directory named `tutorials`, but no > makefile. My configure line is > > ./configure PETSC_ARCH=arch-linux-c-debug --with-cc=clang > --with-cxx=clang++ COPTFLAGS="-O3 -march=native -mtune=native -fPIC" > CXXOPTFLAGS="-O3 -march=native -mtune=native -fPIC" FOPTFLAGS="-O3 > -march=native -mtune=native -fPIC" --with-avx2=1 --download-mpich > --download-hypre --download-scalapack --download-mumps --with-debugging=yes > --with-blaslapack-dir=/opt/intel/mkl --download-zlib --download-libpng > --download-giflib --download-libjpeg --download-slepc --download-eigen > > To reiterate, doing the exact same configure, but changing > '-with-debugging=yes' to '-with-debugging=no' (and changing the PETSC_ARCH > name to 'arch-linux-c-debug') and then building the non-debugging version > of PETSc works as normal. Any ideas what could be going on? > > Thanks! > > -Zane Jakobs > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Oct 3 10:34:53 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 3 Oct 2020 10:34:53 -0500 (CDT) Subject: [petsc-users] Debug build fails In-Reply-To: References: Message-ID: Can use 'git status' to figure out what files are not in the repo. And a 'git clean -f -d -x' will delete everything except for the files in the repo [so use with care]. Satish On Sat, 3 Oct 2020, Matthew Knepley wrote: > On Sat, Oct 3, 2020 at 11:11 AM Zane Charles Jakobs < > Zane.Jakobs at colorado.edu> wrote: > > > Hi PETSc devs, > > > > I just pulled the latest version of PETSc on master, and while my > > optimized build works fine, my debug build fails with the message > > > > make[1]: *** No rule to make target 'src/sys/logging/examples/makefile'. > > Stop. > > make: *** [GNUmakefile:17: src/sys/logging/examples/makefile] Error 2 > > > > Doing > > ls src/sys/logging/examples > > > > Remove this directory. It is not in the repository, and its presence is > confusing the automatic detection for the build. > > Thanks, > > Matt > > > > shows a file named `index.html` and a directory named `tutorials`, but no > > makefile. My configure line is > > > > ./configure PETSC_ARCH=arch-linux-c-debug --with-cc=clang > > --with-cxx=clang++ COPTFLAGS="-O3 -march=native -mtune=native -fPIC" > > CXXOPTFLAGS="-O3 -march=native -mtune=native -fPIC" FOPTFLAGS="-O3 > > -march=native -mtune=native -fPIC" --with-avx2=1 --download-mpich > > --download-hypre --download-scalapack --download-mumps --with-debugging=yes > > --with-blaslapack-dir=/opt/intel/mkl --download-zlib --download-libpng > > --download-giflib --download-libjpeg --download-slepc --download-eigen > > > > To reiterate, doing the exact same configure, but changing > > '-with-debugging=yes' to '-with-debugging=no' (and changing the PETSC_ARCH > > name to 'arch-linux-c-debug') and then building the non-debugging version > > of PETSc works as normal. Any ideas what could be going on? > > > > Thanks! > > > > -Zane Jakobs > > > > > From Zane.Jakobs at colorado.edu Sat Oct 3 10:42:26 2020 From: Zane.Jakobs at colorado.edu (Zane Charles Jakobs) Date: Sat, 3 Oct 2020 08:42:26 -0700 Subject: [petsc-users] Debug build fails In-Reply-To: References: Message-ID: Thanks, Matt and Satish! Everything seems like it's working now. On Sat, Oct 3, 2020 at 8:34 AM Satish Balay wrote: > Can use 'git status' to figure out what files are not in the repo. > > And a 'git clean -f -d -x' will delete everything except for the files in > the repo [so use with care]. > > Satish > > On Sat, 3 Oct 2020, Matthew Knepley wrote: > > > On Sat, Oct 3, 2020 at 11:11 AM Zane Charles Jakobs < > > Zane.Jakobs at colorado.edu> wrote: > > > > > Hi PETSc devs, > > > > > > I just pulled the latest version of PETSc on master, and while my > > > optimized build works fine, my debug build fails with the message > > > > > > make[1]: *** No rule to make target > 'src/sys/logging/examples/makefile'. > > > Stop. > > > make: *** [GNUmakefile:17: src/sys/logging/examples/makefile] Error 2 > > > > > > Doing > > > ls src/sys/logging/examples > > > > > > > Remove this directory. It is not in the repository, and its presence is > > confusing the automatic detection for the build. > > > > Thanks, > > > > Matt > > > > > > > shows a file named `index.html` and a directory named `tutorials`, but > no > > > makefile. My configure line is > > > > > > ./configure PETSC_ARCH=arch-linux-c-debug --with-cc=clang > > > --with-cxx=clang++ COPTFLAGS="-O3 -march=native -mtune=native -fPIC" > > > CXXOPTFLAGS="-O3 -march=native -mtune=native -fPIC" FOPTFLAGS="-O3 > > > -march=native -mtune=native -fPIC" --with-avx2=1 --download-mpich > > > --download-hypre --download-scalapack --download-mumps > --with-debugging=yes > > > --with-blaslapack-dir=/opt/intel/mkl --download-zlib --download-libpng > > > --download-giflib --download-libjpeg --download-slepc --download-eigen > > > > > > To reiterate, doing the exact same configure, but changing > > > '-with-debugging=yes' to '-with-debugging=no' (and changing the > PETSC_ARCH > > > name to 'arch-linux-c-debug') and then building the non-debugging > version > > > of PETSc works as normal. Any ideas what could be going on? > > > > > > Thanks! > > > > > > -Zane Jakobs > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Sat Oct 3 19:36:08 2020 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Sat, 3 Oct 2020 20:36:08 -0400 Subject: [petsc-users] Visualizing a 3D parallel DMPLEX mesh Message-ID: Hi Petsc users, I have a 3D distributed DMPLEX mesh. I would like to visualize the mesh. Specifically, I want to see domain ownership of every MPI rank, i.e. each rank with a different color. . Would you please suggest the best way to do this? Sincerely, SG -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Oct 3 21:19:09 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 3 Oct 2020 22:19:09 -0400 Subject: [petsc-users] Visualizing a 3D parallel DMPLEX mesh In-Reply-To: References: Message-ID: On Sat, Oct 3, 2020 at 8:36 PM Swarnava Ghosh wrote: > Hi Petsc users, > > I have a 3D distributed DMPLEX mesh. I would like to visualize the mesh. > Specifically, I want to see domain ownership of every MPI rank, i.e. each > rank with a different color. . Would you please suggest the best way to do > this? > I do it this way. I use DMViewFromOptions(dm, NULL, "-dm_view") in my code. Then I run it ./my_prog -dm_view hdf5:mesh.h5 -dm_partition_view ${PETSC_DIR}/lib/petsc/bin/petsc_gen_xdmf.py mesh.h5 which creates mesh.h5 and mesh.xmf which can be loaded in Paraview. There is a "rank" field there that you can visualize over the mesh. Thanks, Matt > Sincerely, > SG > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bastian.loehrer at tu-dresden.de Mon Oct 5 12:26:42 2020 From: bastian.loehrer at tu-dresden.de (=?UTF-8?Q?Bastian_L=c3=b6hrer?=) Date: Mon, 5 Oct 2020 19:26:42 +0200 Subject: [petsc-users] DMView / print out ownership ranges In-Reply-To: References: <31f4d18a-a6d3-95ff-26d5-fa2421d89d60@tu-dresden.de> Message-ID: <335dfeb8-d36b-b909-3074-642856593e5d@tu-dresden.de> Hello, This was in fact our own bug and not an error in PETSc. A misplaced call to DMSetUp was the culprit. I'm sorry to have bothered you with this. We are going to update to the latest release, too. Thanks, Bastian On 13.08.20 14:53, Matthew Knepley wrote: > On Thu, Aug 13, 2020 at 8:49 AM Bastian L?hrer > > > wrote: > > Dear PETSc people, > > in PETSc 3.3 > > ??? call DMView( dm, PETSC_VIEWER_STDOUT_WORLD, ierr) > > printed out the ownership ranges like so: > > Processor [0] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 > X range of indices: 0 32, Y range of indices: 0 17, Z range of > indices: 0 16 > Processor [1] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 > X range of indices: 0 32, Y range of indices: 17 34, Z range > of indices: 0 16 > Processor [2] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 > X range of indices: 0 32, Y range of indices: 0 17, Z range of > indices: 16 32 > Processor [3] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 > X range of indices: 0 32, Y range of indices: 17 34, Z range > of indices: 16 32 > > In PETSc 3.8.4 (and later?) the same function call only prints out: > > DM Object: 4 MPI processes > ? type: da > > Does the feature to print out the ownership ranges still exist? > I am unable to find it. > > Certainly the latest release prints what you expect: > > knepley/feature-plex-stokes-tutorial > $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ make ex5 > > /PETSc3/petsc/apple/bin/mpicc -Wl,-multiply_defined,suppress > -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs > -Wl,-search_paths_first -Wl,-no_compact_unwind-Wall -Wwrite-strings > -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector > -fno-stack-check -Qunused-arguments -fvisibility=hidden -g3 -Wall > -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas > -fstack-protector -fno-stack-check -Qunused-arguments > -fvisibility=hidden -g3 -I/PETSc3/petsc/petsc-dev/include > -I/PETSc3/petsc/petsc-dev/arch-master-debug/include -I/opt/X11/include > -I/PETSc3/petsc/apple/include > -I/PETSc3/petsc/petsc-dev/arch-master-debug/include/eigen3ex5.c-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-debug/lib > -L/PETSc3/petsc/petsc-dev/arch-master-debug/lib > -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-debug/lib > -L/PETSc3/petsc/petsc-dev/arch-master-debug/lib > -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib > -Wl,-rpath,/PETSc3/petsc/apple/lib -L/PETSc3/petsc/apple/lib > -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin19/9.2.0 > -L/usr/local/lib/gcc/x86_64-apple-darwin19/9.2.0 > -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lcmumps -ldmumps > -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu > -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > -lsuperlu_dist -lml -lfftw3_mpi -lfftw3 -lp4est -lsc -llapack -lblas > -legadslite -ltriangle -lX11 -lexodus -lnetcdf -lpnetcdf > -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis > -lmetis -lz -lctetgen -lc++ -ldl -lmpifort -lmpi -lpmpi -lgfortran > -lquadmath -lm -lc++ -ldl -o ex5 > > knepley/feature-plex-stokes-tutorial > $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ ./ex5 -dm_view > > DM Object: 1 MPI processes > > type: da > > Processor [0] M 4 N 4 m 1 n 1 w 1 s 1 > > X range of indices: 0 4, Y range of indices: 0 4 > > DM Object: 1 MPI processes > > type: da > > Processor [0] M 4 N 4 m 1 n 1 w 2 s 1 > > X range of indices: 0 4, Y range of indices: 0 4 > > knepley/feature-plex-stokes-tutorial > $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ $MPIEXEC -np 4 ./ex5 > -dm_view > > DM Object: 4 MPI processes > > type: da > > Processor [0] M 4 N 4 m 2 n 2 w 1 s 1 > > X range of indices: 0 2, Y range of indices: 0 2 > > Processor [1] M 4 N 4 m 2 n 2 w 1 s 1 > > X range of indices: 2 4, Y range of indices: 0 2 > > Processor [2] M 4 N 4 m 2 n 2 w 1 s 1 > > X range of indices: 0 2, Y range of indices: 2 4 > > Processor [3] M 4 N 4 m 2 n 2 w 1 s 1 > > X range of indices: 2 4, Y range of indices: 2 4 > > DM Object: 4 MPI processes > > type: da > > Processor [0] M 4 N 4 m 2 n 2 w 2 s 1 > > X range of indices: 0 2, Y range of indices: 0 2 > > Processor [1] M 4 N 4 m 2 n 2 w 2 s 1 > > X range of indices: 2 4, Y range of indices: 0 2 > > Processor [2] M 4 N 4 m 2 n 2 w 2 s 1 > > X range of indices: 0 2, Y range of indices: 2 4 > > Processor [3] M 4 N 4 m 2 n 2 w 2 s 1 > > X range of indices: 2 4, Y range of indices: 2 4 > > We can try and go back to debug 3.8.4, but that is a long time ago. > Can you use the latest release? > > ? Thanks, > > ? ? Matt > > Best, > Bastian > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.jamond at cea.fr Tue Oct 6 06:57:22 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Tue, 6 Oct 2020 13:57:22 +0200 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> Message-ID: On 03/10/2020 00:23, Barry Smith wrote: > I think what Jed is saying is that you should just actually build your preconditioner for your Ct*C + Qt*S*Q operator with S. Because Ct is tall and skinny the eigenstructure of Ct*C + Qt*S*Q is just the eigenstructure of S with a low rank "modification" and Krylov methods (GMRES) are good at solving problems where the eigenstructure of the preconditioner is only a small rank modification of the eigenstructure of the operator you are supply to GMRES. In the best situation each new iteration of GMRES corrects one more of the "rogue" eigen directions. I would first use a direct solver with S just to test how well it works as a preconditioner and then switch to GAMG or whatever should work efficiently for solving your particular S matrix. > > I'd be interested in hearing how well the Ainsworth Formula works, it is something that might be worth adding to PCFIELDSPLIT. > > > Barry Hi Barry, Thanks for these clarifications. To give some context, the test I am working on is a traction on an elastoplastic cube in large strain on which I apply 2% of strain at the first loading increment. The cube has 14739 dofs, and the number of rows of the C matrix is 867. In this simple case, the C matrix just refers to simple dirichlet conditions. Then Q is diagonal with 1. on dofs without dirichlet on 0. for dofs with dirichlets. Q'*S*Q is like S with zeros on lines/columns referring to dofs with dirichlet, and?then C'*C just re-add non null value on the diagonal for the dofs with dirichlet. In the end, I feel that in this case the ainsworth method just do exactly the same as row/column elimination that can be done with MatZeroRowsColumns and the x and b optional vectors provided. On this test, with '-ksp_rtol 1.e-9' and '-ksp_type gmres', using S as a preconditionning matrix and a direct solver gives 65 iterations of the gmres for my first newton iteration (where S is SPD) and between 170 and 290 for the next ones (S is still symmetric but has negative eigenvalues). If I use '-pc_type gamg', the number of iterations of the gmres for the first (SPD) newton iteration is (14 with Sp / 23 with S), and for the next ones (not SPD) it is (~45 with Sp / ~180 with S). In this case with only simple dirichlets, I think I would like that the PCApply does something like: (I-Q)*jacobi(Ct*C)*(I-Q) + Q*precond(S)*Q. BUt I am not sure how to do that (I am quite newbie with petsc)... With a PCShell and PCShellSetApply? In the end, if we found something that works well with the ainsworth formula, it would be nice to have it natively with PCFIELDSPLIT! Many thanks, Olivier From bsmith at petsc.dev Tue Oct 6 10:51:37 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 6 Oct 2020 10:51:37 -0500 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> Message-ID: <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> > On Oct 6, 2020, at 6:57 AM, Olivier Jamond wrote: > > > On 03/10/2020 00:23, Barry Smith wrote: >> I think what Jed is saying is that you should just actually build your preconditioner for your Ct*C + Qt*S*Q operator with S. Because Ct is tall and skinny the eigenstructure of Ct*C + Qt*S*Q is just the eigenstructure of S with a low rank "modification" and Krylov methods (GMRES) are good at solving problems where the eigenstructure of the preconditioner is only a small rank modification of the eigenstructure of the operator you are supply to GMRES. In the best situation each new iteration of GMRES corrects one more of the "rogue" eigen directions. I would first use a direct solver with S just to test how well it works as a preconditioner and then switch to GAMG or whatever should work efficiently for solving your particular S matrix. >> >> I'd be interested in hearing how well the Ainsworth Formula works, it is something that might be worth adding to PCFIELDSPLIT. >> >> >> Barry > > Hi Barry, > > Thanks for these clarifications. > > To give some context, the test I am working on is a traction on an elastoplastic cube in large strain on which I apply 2% of strain at the first loading increment. The cube has 14739 dofs, and the number of rows of the C matrix is 867. > > In this simple case, the C matrix just refers to simple dirichlet conditions. Then Q is diagonal with 1. on dofs without dirichlet on 0. for dofs with dirichlets. Q'*S*Q is like S with zeros on lines/columns referring to dofs with dirichlet, and then C'*C just re-add non null value on the diagonal for the dofs with dirichlet. In the end, I feel that in this case the ainsworth method just do exactly the same as row/column elimination that can be done with MatZeroRowsColumns and the x and b optional vectors provided. > > On this test, with '-ksp_rtol 1.e-9' and '-ksp_type gmres', using S as a preconditionning matrix and a direct solver gives 65 iterations of the gmres for my first newton iteration (where S is SPD) and between 170 and 290 for the next ones (S is still symmetric but has negative eigenvalues). If I use '-pc_type gamg', the number of iterations of the gmres for the first (SPD) newton iteration is (14 with Sp / 23 with S), and for the next ones (not SPD) it is (~45 with Sp / ~180 with S). Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)). Sp = Ct*C + Qt * S * Q = Ct*C + [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks. You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants. You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD. Perhaps I am missing something? Note you can also prototype this process in Matlab very quickly to find any glitches. Hopefully you will find [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] has only a slightly "bulked out" sparsity of S. > > In this case with only simple dirichlets, I think I would like that the PCApply does something like: (I-Q)*jacobi(Ct*C)*(I-Q) + Q*precond(S)*Q. BUt I am not sure how to do that (I am quite newbie with petsc)... With a PCShell and PCShellSetApply? I don't think there is any reason to think that using I-Q)*jacobi(Ct*C)*(I-Q) + Q*precond(S)*Q. would be a particularly good preconditioner for Sp. That is much better than using a preconditioner built from S. But you can use PCCOMPOSITE and PCSHELL with KSP inside for the two jacobi(Ct*C) and precond(S). Barry > > In the end, if we found something that works well with the ainsworth formula, it would be nice to have it natively with PCFIELDSPLIT! > > Many thanks, > Olivier > From ashish.patel at onscale.com Tue Oct 6 14:31:08 2020 From: ashish.patel at onscale.com (Ashish Patel) Date: Tue, 6 Oct 2020 12:31:08 -0700 Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM In-Reply-To: References: Message-ID: Upon some more testing, the idea of adding the disconnected vertex point to the cone of the surface point didn't pan out. It was effecting the closure of other points in the mesh and also for mpi simulations the disconnected vertex was thrown away from the distributed mesh. I instead switched to using MatSetValues to achieve what I wanted along with supplementing the dof of the reference node to a point in the continuous mesh. There is a performance price to pay since I am adding new non zeros after the DM matrix creation. Based on this post which had a similar problem statement https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2017-January/031318.html It seems that preallocating the matrix ourselves instead of using the DM is a possible solution for avoiding this performance issue. Thanks Ashish On Fri, Oct 2, 2020 at 4:15 PM Ashish Patel wrote: > Dear PETSc users, > > I am trying to assemble a matrix for a finite element problem where the > degree of freedom (dof) on a surface is constrained via a reference node > which also exists in the DM but is not connected with any other point in > the mesh. To apply the constraint I want to be able to set matrix values in > the rows belonging to dofs of reference nodes and columns belonging to dofs > of surface nodes and vice versa. But since the two points are not connected > topologically I cannot just use DMPlexMatSetClosure to do that. I am > currently trying to use DMPlexAddConeSize on all the constrained surface > points followed by a call to DMPlexInsertCone wherein I add the reference > node to the cone of surface point before setting up the PetscSection. Is > this the right approach? I am currently getting following message > > New nonzero at (8952,28311) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > off this check > > I could set the suggested option to get rid of the error but was wondering > if I am missing something. > > Thanks > Ashish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 6 14:46:57 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 6 Oct 2020 15:46:57 -0400 Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM In-Reply-To: References: Message-ID: On Fri, Oct 2, 2020 at 7:16 PM Ashish Patel wrote: > Dear PETSc users, > > I am trying to assemble a matrix for a finite element problem where the > degree of freedom (dof) on a surface is constrained via a reference node > which also exists in the DM but is not connected with any other point in > the mesh. To apply the constraint I want to be able to set matrix values in > the rows belonging to dofs of reference nodes and columns belonging to dofs > of surface nodes and vice versa. But since the two points are not connected > topologically I cannot just use DMPlexMatSetClosure to do that. I am > currently trying to use DMPlexAddConeSize on all the constrained surface > points followed by a call to DMPlexInsertCone wherein I add the reference > node to the cone of surface point before setting up the PetscSection. Is > this the right approach? > No. You are changing the topology, which is not what you want I think. I think you just want to associate extra dof with the face. You can do this by just altering the PetscSection you use. Do you create the Section yourself now? Thanks, Matt > I am currently getting following message > > New nonzero at (8952,28311) caused a malloc > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn > off this check > > I could set the suggested option to get rid of the error but was wondering > if I am missing something. > > Thanks > Ashish > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashish.patel at onscale.com Tue Oct 6 15:08:44 2020 From: ashish.patel at onscale.com (Ashish Patel) Date: Tue, 6 Oct 2020 13:08:44 -0700 Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM In-Reply-To: References: Message-ID: Hi Matt, Yes I do create the section myself. There are many faces which are constrained by a single reference point. So even if I create an extra dof at one of the faces I would have to get access to rows/columns of other distant faces which does not exist in the adjacency relationship. Thanks Ashish On Tue, Oct 6, 2020 at 12:47 PM Matthew Knepley wrote: > On Fri, Oct 2, 2020 at 7:16 PM Ashish Patel > wrote: > >> Dear PETSc users, >> >> I am trying to assemble a matrix for a finite element problem where the >> degree of freedom (dof) on a surface is constrained via a reference node >> which also exists in the DM but is not connected with any other point in >> the mesh. To apply the constraint I want to be able to set matrix values in >> the rows belonging to dofs of reference nodes and columns belonging to dofs >> of surface nodes and vice versa. But since the two points are not connected >> topologically I cannot just use DMPlexMatSetClosure to do that. I am >> currently trying to use DMPlexAddConeSize on all the constrained surface >> points followed by a call to DMPlexInsertCone wherein I add the reference >> node to the cone of surface point before setting up the PetscSection. Is >> this the right approach? >> > > No. You are changing the topology, which is not what you want I think. I > think you just want to associate extra dof with the face. You can > do this by just altering the PetscSection you use. Do you create the > Section yourself now? > > Thanks, > > Matt > > >> I am currently getting following message >> >> New nonzero at (8952,28311) caused a malloc >> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn >> off this check >> >> I could set the suggested option to get rid of the error but was >> wondering if I am missing something. >> >> Thanks >> Ashish >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 6 15:59:11 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 6 Oct 2020 16:59:11 -0400 Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM In-Reply-To: References: Message-ID: On Tue, Oct 6, 2020 at 4:08 PM Ashish Patel wrote: > Hi Matt, > > Yes I do create the section myself. There are many faces which are > constrained by a single reference point. So even if I create an extra dof > at one of the faces I would have to get access to rows/columns of other > distant faces which does not exist in the adjacency relationship. > Okay, so these dogs do not obey the FEM sparsity pattern. Is it inherent to the method, or do you just do this for convenience? It might be easier to constrain one face at a time, but I would need to understand more. If you cannot do that, you would also have to override the sparsity pattern. I can easily give you a hook to do this if you want. Thanks, Matt > Thanks > Ashish > > On Tue, Oct 6, 2020 at 12:47 PM Matthew Knepley wrote: > >> On Fri, Oct 2, 2020 at 7:16 PM Ashish Patel >> wrote: >> >>> Dear PETSc users, >>> >>> I am trying to assemble a matrix for a finite element problem where the >>> degree of freedom (dof) on a surface is constrained via a reference node >>> which also exists in the DM but is not connected with any other point in >>> the mesh. To apply the constraint I want to be able to set matrix values in >>> the rows belonging to dofs of reference nodes and columns belonging to dofs >>> of surface nodes and vice versa. But since the two points are not connected >>> topologically I cannot just use DMPlexMatSetClosure to do that. I am >>> currently trying to use DMPlexAddConeSize on all the constrained surface >>> points followed by a call to DMPlexInsertCone wherein I add the reference >>> node to the cone of surface point before setting up the PetscSection. Is >>> this the right approach? >>> >> >> No. You are changing the topology, which is not what you want I think. I >> think you just want to associate extra dof with the face. You can >> do this by just altering the PetscSection you use. Do you create the >> Section yourself now? >> >> Thanks, >> >> Matt >> >> >>> I am currently getting following message >>> >>> New nonzero at (8952,28311) caused a malloc >>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn >>> off this check >>> >>> I could set the suggested option to get rid of the error but was >>> wondering if I am missing something. >>> >>> Thanks >>> Ashish >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashish.patel at onscale.com Tue Oct 6 16:28:27 2020 From: ashish.patel at onscale.com (Ashish Patel) Date: Tue, 6 Oct 2020 14:28:27 -0700 Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM In-Reply-To: References: Message-ID: Its inherent, physical problem for this particular case is a rigid body (whose dof is represented using a reference point) attached to a deformable body along an interface (hence many faces). The sparsity pattern for a deformable body follows the traditional FEM sparsity pattern. So I need to supplement that sparsity pattern to account for the constraint off diagonal terms. I am doing it currently after DMCreateMatrix via call to MatSetValues but that's of course not ideal. I also tried doing it by first calling DMSetMatrixPreallocateOnly but that caused problems later on as probably the number of new values exceeded the preallocated limit for that row. If you have some additional pointers that would be really helpful. Thanks Ashish On Tue, Oct 6, 2020 at 1:59 PM Matthew Knepley wrote: > On Tue, Oct 6, 2020 at 4:08 PM Ashish Patel > wrote: > >> Hi Matt, >> >> Yes I do create the section myself. There are many faces which are >> constrained by a single reference point. So even if I create an extra dof >> at one of the faces I would have to get access to rows/columns of other >> distant faces which does not exist in the adjacency relationship. >> > > Okay, so these dogs do not obey the FEM sparsity pattern. Is it inherent > to the method, or do you just do this for convenience? It might > be easier to constrain one face at a time, but I would need to understand > more. If you cannot do that, you would also have to override > the sparsity pattern. I can easily give you a hook to do this if you want. > > Thanks, > > Matt > > >> Thanks >> Ashish >> >> On Tue, Oct 6, 2020 at 12:47 PM Matthew Knepley >> wrote: >> >>> On Fri, Oct 2, 2020 at 7:16 PM Ashish Patel >>> wrote: >>> >>>> Dear PETSc users, >>>> >>>> I am trying to assemble a matrix for a finite element problem where the >>>> degree of freedom (dof) on a surface is constrained via a reference node >>>> which also exists in the DM but is not connected with any other point in >>>> the mesh. To apply the constraint I want to be able to set matrix values in >>>> the rows belonging to dofs of reference nodes and columns belonging to dofs >>>> of surface nodes and vice versa. But since the two points are not connected >>>> topologically I cannot just use DMPlexMatSetClosure to do that. I am >>>> currently trying to use DMPlexAddConeSize on all the constrained surface >>>> points followed by a call to DMPlexInsertCone wherein I add the reference >>>> node to the cone of surface point before setting up the PetscSection. Is >>>> this the right approach? >>>> >>> >>> No. You are changing the topology, which is not what you want I think. I >>> think you just want to associate extra dof with the face. You can >>> do this by just altering the PetscSection you use. Do you create the >>> Section yourself now? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> I am currently getting following message >>>> >>>> New nonzero at (8952,28311) caused a malloc >>>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to >>>> turn off this check >>>> >>>> I could set the suggested option to get rid of the error but was >>>> wondering if I am missing something. >>>> >>>> Thanks >>>> Ashish >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Oct 6 16:59:39 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 6 Oct 2020 14:59:39 -0700 Subject: [petsc-users] compiling PETSc using c++ compiler Message-ID: Dear PETSc dev team, When I compile PETSc using --with-cc=gcc --with-cxx=g++ --with-clanguage=cxx, I got following error: ../../../petsc/src/sys/objects/pinit.c: In function ?PetscInitialize?: ../../../petsc/src/sys/objects/pinit.c:913:21: error: expected declaration specifiers or ?...? before numeric constant 913 | PetscComplex ic(0.0,1.0); | ^~~ ../../../petsc/src/sys/objects/pinit.c:913:25: error: expected declaration specifiers or ?...? before numeric constant 913 | PetscComplex ic(0.0,1.0); | ^~~ ../../../petsc/src/sys/objects/pinit.c:914:15: error: ?ic? undeclared (first use in this function) 914 | PETSC_i = ic; | ^~ ../../../petsc/src/sys/objects/pinit.c:914:15: note: each undeclared identifier is reported only once for each function it appears in Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Oct 6 17:23:12 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 6 Oct 2020 17:23:12 -0500 (CDT) Subject: [petsc-users] compiling PETSc using c++ compiler In-Reply-To: References: Message-ID: Can you send the complete logs for this build [configure.log, make.log]? Satish On Tue, 6 Oct 2020, Sam Guo wrote: > Dear PETSc dev team, > When I compile PETSc using > --with-cc=gcc --with-cxx=g++ --with-clanguage=cxx, > I got following error: > ../../../petsc/src/sys/objects/pinit.c: In function ?PetscInitialize?: > ../../../petsc/src/sys/objects/pinit.c:913:21: error: expected declaration > specifiers or ?...? before numeric constant > 913 | PetscComplex ic(0.0,1.0); > | ^~~ > ../../../petsc/src/sys/objects/pinit.c:913:25: error: expected declaration > specifiers or ?...? before numeric constant > 913 | PetscComplex ic(0.0,1.0); > | ^~~ > ../../../petsc/src/sys/objects/pinit.c:914:15: error: ?ic? undeclared > (first use in this function) > 914 | PETSC_i = ic; > | ^~ > ../../../petsc/src/sys/objects/pinit.c:914:15: note: each undeclared > identifier is reported only once for each function it appears in > > Thanks, > Sam > From balay at mcs.anl.gov Tue Oct 6 17:35:21 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 6 Oct 2020 17:35:21 -0500 (CDT) Subject: [petsc-users] compiling PETSc using c++ compiler In-Reply-To: References: Message-ID: And BTW: --with-clanguage=cxx is not needed for using PETSc from c++. It primarily exists for debugging purposes [or some corner cases where C build of PETSc does not work from c++ code] Satish On Tue, 6 Oct 2020, Satish Balay via petsc-users wrote: > Can you send the complete logs for this build [configure.log, make.log]? > > Satish > > On Tue, 6 Oct 2020, Sam Guo wrote: > > > Dear PETSc dev team, > > When I compile PETSc using > > --with-cc=gcc --with-cxx=g++ --with-clanguage=cxx, > > I got following error: > > ../../../petsc/src/sys/objects/pinit.c: In function ?PetscInitialize?: > > ../../../petsc/src/sys/objects/pinit.c:913:21: error: expected declaration > > specifiers or ?...? before numeric constant > > 913 | PetscComplex ic(0.0,1.0); > > | ^~~ > > ../../../petsc/src/sys/objects/pinit.c:913:25: error: expected declaration > > specifiers or ?...? before numeric constant > > 913 | PetscComplex ic(0.0,1.0); > > | ^~~ > > ../../../petsc/src/sys/objects/pinit.c:914:15: error: ?ic? undeclared > > (first use in this function) > > 914 | PETSC_i = ic; > > | ^~ > > ../../../petsc/src/sys/objects/pinit.c:914:15: note: each undeclared > > identifier is reported only once for each function it appears in > > > > Thanks, > > Sam > > > From sam.guo at cd-adapco.com Tue Oct 6 18:44:51 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 6 Oct 2020 16:44:51 -0700 Subject: [petsc-users] compiling PETSc using c++ compiler In-Reply-To: References: Message-ID: Hi Satish, I am using our internal makefile to make PETSc. When I use PETSc makefile, it works. Hence it must be our compiling flags responsible for the error. I'll look into it. I want to experiment c++ compiler to see if I can compile real and complex versions to different symbols (I've created another thread for this topic.) but it seems c++ compiler does not help and I still get same symbols. Thanks, Sam On Tue, Oct 6, 2020 at 3:35 PM Satish Balay wrote: > And BTW: --with-clanguage=cxx is not needed for using PETSc from > c++. It primarily exists for debugging purposes [or some corner cases > where C build of PETSc does not work from c++ code] > > Satish > > On Tue, 6 Oct 2020, Satish Balay via petsc-users wrote: > > > Can you send the complete logs for this build [configure.log, make.log]? > > > > Satish > > > > On Tue, 6 Oct 2020, Sam Guo wrote: > > > > > Dear PETSc dev team, > > > When I compile PETSc using > > > --with-cc=gcc --with-cxx=g++ --with-clanguage=cxx, > > > I got following error: > > > ../../../petsc/src/sys/objects/pinit.c: In function ?PetscInitialize?: > > > ../../../petsc/src/sys/objects/pinit.c:913:21: error: expected > declaration > > > specifiers or ?...? before numeric constant > > > 913 | PetscComplex ic(0.0,1.0); > > > | ^~~ > > > ../../../petsc/src/sys/objects/pinit.c:913:25: error: expected > declaration > > > specifiers or ?...? before numeric constant > > > 913 | PetscComplex ic(0.0,1.0); > > > | ^~~ > > > ../../../petsc/src/sys/objects/pinit.c:914:15: error: ?ic? undeclared > > > (first use in this function) > > > 914 | PETSC_i = ic; > > > | ^~ > > > ../../../petsc/src/sys/objects/pinit.c:914:15: note: each undeclared > > > identifier is reported only once for each function it appears in > > > > > > Thanks, > > > Sam > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 6 23:49:10 2020 From: jed at jedbrown.org (Jed Brown) Date: Tue, 06 Oct 2020 22:49:10 -0600 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> Message-ID: <87ft6qmyih.fsf@jedbrown.org> Barry Smith writes: >> On Oct 6, 2020, at 6:57 AM, Olivier Jamond wrote: >> >> >> On 03/10/2020 00:23, Barry Smith wrote: >>> I think what Jed is saying is that you should just actually build your preconditioner for your Ct*C + Qt*S*Q operator with S. Because Ct is tall and skinny the eigenstructure of Ct*C + Qt*S*Q is just the eigenstructure of S with a low rank "modification" and Krylov methods (GMRES) are good at solving problems where the eigenstructure of the preconditioner is only a small rank modification of the eigenstructure of the operator you are supply to GMRES. In the best situation each new iteration of GMRES corrects one more of the "rogue" eigen directions. I would first use a direct solver with S just to test how well it works as a preconditioner and then switch to GAMG or whatever should work efficiently for solving your particular S matrix. >>> >>> I'd be interested in hearing how well the Ainsworth Formula works, it is something that might be worth adding to PCFIELDSPLIT. >>> >>> >>> Barry >> >> Hi Barry, >> >> Thanks for these clarifications. >> >> To give some context, the test I am working on is a traction on an elastoplastic cube in large strain on which I apply 2% of strain at the first loading increment. The cube has 14739 dofs, and the number of rows of the C matrix is 867. >> >> In this simple case, the C matrix just refers to simple dirichlet conditions. Then Q is diagonal with 1. on dofs without dirichlet on 0. for dofs with dirichlets. Q'*S*Q is like S with zeros on lines/columns referring to dofs with dirichlet, and then C'*C just re-add non null value on the diagonal for the dofs with dirichlet. In the end, I feel that in this case the ainsworth method just do exactly the same as row/column elimination that can be done with MatZeroRowsColumns and the x and b optional vectors provided. >> >> On this test, with '-ksp_rtol 1.e-9' and '-ksp_type gmres', using S as a preconditionning matrix and a direct solver gives 65 iterations of the gmres for my first newton iteration (where S is SPD) and between 170 and 290 for the next ones (S is still symmetric but has negative eigenvalues). If I use '-pc_type gamg', the number of iterations of the gmres for the first (SPD) newton iteration is (14 with Sp / 23 with S), and for the next ones (not SPD) it is (~45 with Sp / ~180 with S). > > Given the structure of C it seems you should just explicitly > construct Sp and use GAMG (or other preconditioners, even a direct > solver) directly on Sp. Trying to avoid explicitly forming Sp will > give you a much slower performing solving for what benefit? Note that S is singular if it's a pure Neumann stiffness matrix (rigid body modes are in the null space). I'm with Barry that you should just form Sp, which is much more solver friendly. If your constraint matrix C has dense rows (e.g., integral conditions on the boundary), then use FieldSplit for the saddle point problem. From dalcinl at gmail.com Wed Oct 7 04:26:25 2020 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 7 Oct 2020 12:26:25 +0300 Subject: [petsc-users] compiling PETSc using c++ compiler In-Reply-To: References: Message-ID: On Wed, 7 Oct 2020 at 02:45, Sam Guo wrote: > I want to experiment c++ compiler to see if I can compile real and > complex versions to different symbols (I've created another thread for this > topic.) but it seems c++ compiler does not help and I still get same > symbols. > > Indeed, that is not going to make it. Or you have to change the definition of PETSC_EXTERN, such that it does not use `extern "C"`. Even with that, you may have trouble with EXTERN_C_BEGIN/END macros. -- Lisandro Dalcin ============ Senior Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Oct 7 08:11:52 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 7 Oct 2020 09:11:52 -0400 Subject: [petsc-users] [petsc-maint] Poisson with Neumann BC on a stretched grid In-Reply-To: References: <982e1f8e-8b74-8b69-c7ab-fa7eeec650a6@uclouvain.be> Message-ID: On Wed, Oct 7, 2020 at 8:27 AM Victoria Hamtiaux < victoria.hamtiaux at uclouvain.be> wrote: > Thanks for all the answers, > > > How can I do the "semi-coarsening"? I don't really know how those > preconditionners work so I don't how how to change them or so.. > > > You have to write custom code to do semi-coarsening. PETSc does not provide that and you would not want to do it yourself, most likely. > I have a question because you both seem to say that my matrix is supposed > to be symmetric which is not the case. \ > You said "my matrix is symmetric." Then you said " I suspect that by stretching the grid, my matrix is not symmetric anymore and that it might cause a problem." We are saying that by stretchin the grid the matrix is still symmetric even if the grid has lost a symmetry. I don't know of a mechanism for stretching the grid to make the matrix asymmetric. So we are suggesting that you verify your suspicion that the matrix is symmetric. And in fact, I don't get how it can be symmetric. Because you will have > something close to symmetric. For example when you are at the center of > your domain it will be symmetric, but when your at a point at the > boundaries I don't get how you can be symmetric, you won't have something > at the left and the right of your main diagonal... (I don't know if my > explanations are understandable) > You can make a discretization that is not symmetric because of boundary conditions but I assume that is not the case because you said your matrix is symmetric. > Best regards, > > > Victoria > > > > On 7/10/20 14:20, Mark Adams wrote: > > GMG (geometric MG) is stronger as Matt said, but it is affected by > stretched grids in a similar way. A way to fix this in GMG is > semi-coarsening, which AMG _can_ do automatically. > > On Wed, Oct 7, 2020 at 8:02 AM Matthew Knepley wrote: > >> On Wed, Oct 7, 2020 at 7:07 AM Victoria Hamtiaux < >> victoria.hamtiaux at uclouvain.be> wrote: >> >>> Hello Matt, >>> >>> >>> I just checked the symmetry of my matrix and it is not symmetric. But it >>> is not symmetric either when I use a uniform grid. >>> >>> The domain is 3D and I'm using finite differences, so I guess it is >>> normal that at multiple places (when I deal with points near the >>> boundaries), the matrix is not symmetric. >>> >>> So I was wrong, the problem doesn't come from the fact that the matrix >>> is not symmetric. I don't know where it comes from, but when I switch from >>> uniform to stretched grid, the solver stops working properly. Could it be >>> from the preconditionner of the solver that I use? >>> >>> Do you have any other idea ? >>> >> I would consider using GMG. As Mark says, AMG is very fiddly with >> stretched grids. For Poisson, GMG works great and you seem to have regular >> grids. >> >> Thanks, >> >> Matt >> >>> Thanks for your help, >>> >>> >>> Victoria >>> >>> >>> On 7/10/20 12:48, Matthew Knepley wrote: >>> >>> On Wed, Oct 7, 2020 at 6:40 AM Victoria Hamtiaux < >>> victoria.hamtiaux at uclouvain.be> wrote: >>> >>>> Dear all, >>>> >>>> >>>> After the discretization of a poisson equation with purely Neumann (or >>>> periodic) boundary conditions, I get a matrix which is singular. >>>> >>>> >>>> The way I am handling this is by using a NullSpace with the following >>>> code : >>>> >>>> MatNullSpace nullspace; >>>> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, 0, &nullspace); >>>> MatSetNullSpace(p_solverp->A, nullspace); >>>> MatSetTransposeNullSpace(p_solverp->A, nullspace); >>>> MatNullSpaceDestroy(&nullspace); >>>> >>>> >>>> Note that I am using the hypre preconditionner BOOMERANG and the >>>> default >>>> solver GMRES. >>>> >>>> >>>> KSPCreate(PETSC_COMM_WORLD,&p_solverp->ksp); >>>> KSPSetOperators(p_solverp->ksp, p_solverp->A, p_solverp->A); >>>> PC prec; >>>> KSPGetPC(p_solverp->ksp, &prec); >>>> PCSetType(prec,PCHYPRE);//PCHYPRE seems the best >>>> PCHYPRESetType(prec,"boomeramg"); //boomeramg is the best >>>> KSPSetInitialGuessNonzero(p_solverp->ksp,PETSC_TRUE); >>>> KSPSetFromOptions(p_solverp->ksp); >>>> KSPSetTolerances(p_solverp->ksp, 1.e-10, 1e-10, PETSC_DEFAULT, >>>> PETSC_DEFAULT); >>>> KSPSetReusePreconditioner(p_solverp->ksp,PETSC_TRUE); >>>> KSPSetUseFischerGuess(p_solverp->ksp,1,5); >>>> KSPGMRESSetPreAllocateVectors(p_solverp->ksp); >>>> KSPSetUp(p_solverp->ksp); >>>> >>>> >>>> >>>> And this works fine when my grid is uniform, so that my matrix is >>>> symmetric. >>>> >>>> >>>> But when I stretch the grid near the boundary (my grid is then >>>> non-uniform), it doesn't work properly anymore. I suspect that by >>>> stretching the grid, my matrix is not symmetric anymore and that it >>>> might cause a problem. >>>> >>> >>> Symmetry is a property of the operator, so you should be symmetric on >>> your >>> stretched grid. If not, I think you have the discretization wrong. You >>> can check >>> symmetry using >>> >>> >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatIsSymmetric.html >>> >>> >>> Also, if you suspect your discretization, you should probably do an MMS >>> test to >>> verify that you discretization converges at the correct rate. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> I tried fixing the solution at an arbitrary point, but sometimes doing >>>> this, I get errors near that fixed point. I 've seen on the petsc-users >>>> forum that you usually don't recommend to fix a point, but I don't >>>> really know how to proceed differently. >>>> >>>> >>>> What would you recommend to solve this problem? >>>> >>>> >>>> Thanks for your help, >>>> >>>> >>>> Best regards, >>>> >>>> >>>> Victoria >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Oct 7 10:59:44 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 7 Oct 2020 10:59:44 -0500 Subject: [petsc-users] [petsc-maint] Poisson with Neumann BC on a stretched grid In-Reply-To: References: <982e1f8e-8b74-8b69-c7ab-fa7eeec650a6@uclouvain.be> Message-ID: > On Oct 7, 2020, at 8:11 AM, Mark Adams wrote: > > > On Wed, Oct 7, 2020 at 8:27 AM Victoria Hamtiaux > wrote: > Thanks for all the answers, > > > > How can I do the "semi-coarsening"? I don't really know how those preconditionners work so I don't how how to change them or so.. > > > > You have to write custom code to do semi-coarsening. PETSc does not provide that and you would not want to do it yourself, most likely. We do not provide it directly but if you are using PCMG and DMDA it is relatively straight-forward. You create a coarse DM and then refine it but each refinement you only do in the directions you want set each time with DMDASetRefinementFactor(). Once you have the collections of refined DM's you provide them to PCMG. Barry > > > I have a question because you both seem to say that my matrix is supposed to be symmetric which is not the case. \ > > You said "my matrix is symmetric." > > Then you said " I suspect that by stretching the grid, my matrix is not symmetric anymore and that it might cause a problem." > > We are saying that by stretchin the grid the matrix is still symmetric even if the grid has lost a symmetry. I don't know of a mechanism for stretching the grid to make the matrix asymmetric. So we are suggesting that you verify your suspicion that the matrix is symmetric. > > And in fact, I don't get how it can be symmetric. Because you will have something close to symmetric. For example when you are at the center of your domain it will be symmetric, but when your at a point at the boundaries I don't get how you can be symmetric, you won't have something at the left and the right of your main diagonal... (I don't know if my explanations are understandable) > > You can make a discretization that is not symmetric because of boundary conditions but I assume that is not the case because you said your matrix is symmetric. > > Best regards, > > > > Victoria > > > > > > On 7/10/20 14:20, Mark Adams wrote: >> GMG (geometric MG) is stronger as Matt said, but it is affected by stretched grids in a similar way. A way to fix this in GMG is semi-coarsening, which AMG _can_ do automatically. >> >> On Wed, Oct 7, 2020 at 8:02 AM Matthew Knepley > wrote: >> On Wed, Oct 7, 2020 at 7:07 AM Victoria Hamtiaux > wrote: >> Hello Matt, >> >> >> >> I just checked the symmetry of my matrix and it is not symmetric. But it is not symmetric either when I use a uniform grid. >> >> The domain is 3D and I'm using finite differences, so I guess it is normal that at multiple places (when I deal with points near the boundaries), the matrix is not symmetric. >> >> So I was wrong, the problem doesn't come from the fact that the matrix is not symmetric. I don't know where it comes from, but when I switch from uniform to stretched grid, the solver stops working properly. Could it be from the preconditionner of the solver that I use? >> >> Do you have any other idea ? >> >> I would consider using GMG. As Mark says, AMG is very fiddly with stretched grids. For Poisson, GMG works great and you seem to have regular grids. >> >> Thanks, >> >> Matt >> Thanks for your help, >> >> >> >> Victoria >> >> >> >> On 7/10/20 12:48, Matthew Knepley wrote: >>> On Wed, Oct 7, 2020 at 6:40 AM Victoria Hamtiaux > wrote: >>> Dear all, >>> >>> >>> After the discretization of a poisson equation with purely Neumann (or >>> periodic) boundary conditions, I get a matrix which is singular. >>> >>> >>> The way I am handling this is by using a NullSpace with the following >>> code : >>> >>> MatNullSpace nullspace; >>> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, 0, &nullspace); >>> MatSetNullSpace(p_solverp->A, nullspace); >>> MatSetTransposeNullSpace(p_solverp->A, nullspace); >>> MatNullSpaceDestroy(&nullspace); >>> >>> >>> Note that I am using the hypre preconditionner BOOMERANG and the default >>> solver GMRES. >>> >>> >>> KSPCreate(PETSC_COMM_WORLD,&p_solverp->ksp); >>> KSPSetOperators(p_solverp->ksp, p_solverp->A, p_solverp->A); >>> PC prec; >>> KSPGetPC(p_solverp->ksp, &prec); >>> PCSetType(prec,PCHYPRE);//PCHYPRE seems the best >>> PCHYPRESetType(prec,"boomeramg"); //boomeramg is the best >>> KSPSetInitialGuessNonzero(p_solverp->ksp,PETSC_TRUE); >>> KSPSetFromOptions(p_solverp->ksp); >>> KSPSetTolerances(p_solverp->ksp, 1.e-10, 1e-10, PETSC_DEFAULT, >>> PETSC_DEFAULT); >>> KSPSetReusePreconditioner(p_solverp->ksp,PETSC_TRUE); >>> KSPSetUseFischerGuess(p_solverp->ksp,1,5); >>> KSPGMRESSetPreAllocateVectors(p_solverp->ksp); >>> KSPSetUp(p_solverp->ksp); >>> >>> >>> >>> And this works fine when my grid is uniform, so that my matrix is >>> symmetric. >>> >>> >>> But when I stretch the grid near the boundary (my grid is then >>> non-uniform), it doesn't work properly anymore. I suspect that by >>> stretching the grid, my matrix is not symmetric anymore and that it >>> might cause a problem. >>> >>> Symmetry is a property of the operator, so you should be symmetric on your >>> stretched grid. If not, I think you have the discretization wrong. You can check >>> symmetry using >>> >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatIsSymmetric.html >>> >>> Also, if you suspect your discretization, you should probably do an MMS test to >>> verify that you discretization converges at the correct rate. >>> >>> Thanks, >>> >>> Matt >>> >>> I tried fixing the solution at an arbitrary point, but sometimes doing >>> this, I get errors near that fixed point. I 've seen on the petsc-users >>> forum that you usually don't recommend to fix a point, but I don't >>> really know how to proceed differently. >>> >>> >>> What would you recommend to solve this problem? >>> >>> >>> Thanks for your help, >>> >>> >>> Best regards, >>> >>> >>> Victoria >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Wed Oct 7 13:41:18 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Wed, 7 Oct 2020 11:41:18 -0700 Subject: [petsc-users] Regarding FormFunction in the SNES class Message-ID: Hello, I have a few questions regarding FormFunction when using the SNES solvers. I am using Fortran90. 1) I went through the example (ex1f.F90) provided in the documentation that uses Newton method to solve a two-variable system. In the subroutine FormFunction, the first argument is an input vector (x). However in the code, no attributes are specified saying that it is an input argument for the subroutine (i.e. intent attribute is not specified). Is this automatically taken care of or should I be defining the intent attribute in my code ? Also, should I use the "allocatable" attribute when defining the vector x? Please comment similarly on the output vector f as well. 2) Should the ctx argument of the subroutine FormFunction be defined as "PETSC_NULL_INTEGER"? Please let me know if you need any further information. Thank you. Best Regards, Pranay. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Oct 7 14:19:05 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 7 Oct 2020 14:19:05 -0500 Subject: [petsc-users] Regarding FormFunction in the SNES class In-Reply-To: References: Message-ID: <9CBDA8FE-3399-43C9-88F4-23D66981E736@petsc.dev> > On Oct 7, 2020, at 1:41 PM, baikadi pranay wrote: > > Hello, > I have a few questions regarding FormFunction when using the SNES solvers. I am using Fortran90. > > 1) I went through the example (ex1f.F90) provided in the documentation that uses Newton method to solve a two-variable system. In the subroutine FormFunction, the first argument is an input vector (x). However in the code, no attributes are specified saying that it is an input argument for the subroutine (i.e. intent attribute is not specified). Is this automatically taken care of or should I be defining the intent attribute in my code ? We don't currently provide attributes for our Fortran stubs, so it is best if you do not mark them in your subroutines. Yes the x is input only and the f is output only. > Also, should I use the "allocatable" attribute when defining the vector x? I am pretty sure no. > Please comment similarly on the output vector f as well. > 2) Should the ctx argument of the subroutine FormFunction be defined as "PETSC_NULL_INTEGER"? The context is how you convey additional information into FormFunction(). Should you choose to not use it then in your function you can declare it as a integer and simply not use it. If you are calling your FormFunction() from Fortran then just pass a meaningless integer as that argument. PETSC_NULL_INTEGER is for call PETSc functions that take integer array arguments that you are not supplying. Barry > > Please let me know if you need any further information. > > Thank you. > Best Regards, > Pranay. From knepley at gmail.com Wed Oct 7 17:32:02 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 7 Oct 2020 18:32:02 -0400 Subject: [petsc-users] Regarding FormFunction in the SNES class In-Reply-To: <9CBDA8FE-3399-43C9-88F4-23D66981E736@petsc.dev> References: <9CBDA8FE-3399-43C9-88F4-23D66981E736@petsc.dev> Message-ID: On Wed, Oct 7, 2020 at 4:26 PM Barry Smith wrote: > > > > On Oct 7, 2020, at 1:41 PM, baikadi pranay > wrote: > > > > Hello, > > I have a few questions regarding FormFunction when using the SNES > solvers. I am using Fortran90. > > > > 1) I went through the example (ex1f.F90) provided in the documentation > that uses Newton method to solve a two-variable system. In the subroutine > FormFunction, the first argument is an input vector (x). However in the > code, no attributes are specified saying that it is an input argument for > the subroutine (i.e. intent attribute is not specified). Is this > automatically taken care of or should I be defining the intent attribute in > my code ? > > We don't currently provide attributes for our Fortran stubs, so it is > best if you do not mark them in your subroutines. > > Yes the x is input only and the f is output only. > Are you certain? We do not change the f pointer, you change the data hiding inside. Matt > > Also, should I use the "allocatable" attribute when defining the vector > x? > > I am pretty sure no. > > > Please comment similarly on the output vector f as well. > > 2) Should the ctx argument of the subroutine FormFunction be defined as > "PETSC_NULL_INTEGER"? > > The context is how you convey additional information into > FormFunction(). Should you choose to not use it then in your function you > can declare it as a integer and simply not use it. If you are calling your > FormFunction() from Fortran then just pass a meaningless integer as that > argument. PETSC_NULL_INTEGER is for call PETSc functions that take integer > array arguments that you are not supplying. > > Barry > > > > > > > Please let me know if you need any further information. > > > > Thank you. > > Best Regards, > > Pranay. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Oct 7 20:33:47 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 7 Oct 2020 20:33:47 -0500 Subject: [petsc-users] Regarding FormFunction in the SNES class In-Reply-To: References: <9CBDA8FE-3399-43C9-88F4-23D66981E736@petsc.dev> Message-ID: <7936C907-C274-4DE9-8A1B-2EF1F57C124F@petsc.dev> > On Oct 7, 2020, at 5:02 PM, baikadi pranay wrote: > > Thank you Barry for the response. Just to make sure I understood correctly, I do not need to define any attributes to the vectors and it is automatically taken care of (both the intent as well as the allocatable attributes). Am I correct? > For the second question, my subroutine should look like this: > FormFunction(snes,x,f,3,INTEGER_VARIABLE,ierr) > Is this correct? > For example src/snes/tutorials/ex1f.F90 subroutine FormFunction(snes,x,f,dummy,ierr) use petscsnes implicit none SNES snes Vec x,f PetscErrorCode ierr integer dummy(*) > Thank you in advance. > Best Regards, > Pranay. > > ? > > On Wed, Oct 7, 2020 at 1:26 PM Barry Smith > wrote: > > > > On Oct 7, 2020, at 1:41 PM, baikadi pranay > wrote: > > > > Hello, > > I have a few questions regarding FormFunction when using the SNES solvers. I am using Fortran90. > > > > 1) I went through the example (ex1f.F90) provided in the documentation that uses Newton method to solve a two-variable system. In the subroutine FormFunction, the first argument is an input vector (x). However in the code, no attributes are specified saying that it is an input argument for the subroutine (i.e. intent attribute is not specified). Is this automatically taken care of or should I be defining the intent attribute in my code ? > > We don't currently provide attributes for our Fortran stubs, so it is best if you do not mark them in your subroutines. > > Yes the x is input only and the f is output only. > > > Also, should I use the "allocatable" attribute when defining the vector x? > > I am pretty sure no. > > > Please comment similarly on the output vector f as well. > > 2) Should the ctx argument of the subroutine FormFunction be defined as "PETSC_NULL_INTEGER"? > > The context is how you convey additional information into FormFunction(). Should you choose to not use it then in your function you can declare it as a integer and simply not use it. If you are calling your FormFunction() from Fortran then just pass a meaningless integer as that argument. PETSC_NULL_INTEGER is for call PETSc functions that take integer array arguments that you are not supplying. > > Barry > > > > > > > Please let me know if you need any further information. > > > > Thank you. > > Best Regards, > > Pranay. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Y.Juntao at hotmail.com Thu Oct 8 02:43:07 2020 From: Y.Juntao at hotmail.com (Yang Juntao) Date: Thu, 8 Oct 2020 07:43:07 +0000 Subject: [petsc-users] Convergence Error Debugging with KSP solvers in SNES Message-ID: Hello, I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver. But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error. From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration. Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem. iter = 0, SNES Function norm 2.94934e-06 iteration 0 KSP Residual norm 1.094600281831e-06 iteration 1 KSP Residual norm 1.264284474186e-08 iteration 2 KSP Residual norm 6.593269221816e-09 iteration 3 KSP Residual norm 1.689570779457e-09 iteration 4 KSP Residual norm 1.040661505932e-09 iteration 5 KSP Residual norm 5.422761817348e-10 iteration 6 KSP Residual norm 2.492867371369e-10 iteration 7 KSP Residual norm 8.261522376775e-11 iteration 8 KSP Residual norm 4.246401544245e-11 iteration 9 KSP Residual norm 2.514366787388e-11 iteration 10 KSP Residual norm 1.982940267051e-11 iteration 11 KSP Residual norm 1.586470414676e-11 iteration 12 KSP Residual norm 9.866392216207e-12 iteration 13 KSP Residual norm 4.951342176999e-12 iteration 14 KSP Residual norm 2.418292660318e-12 iteration 15 KSP Residual norm 1.747418526086e-12 iteration 16 KSP Residual norm 1.094150535809e-12 iteration 17 KSP Residual norm 4.464287492066e-13 iteration 18 KSP Residual norm 3.530090494462e-13 iteration 19 KSP Residual norm 2.825698091454e-13 iteration 20 KSP Residual norm 1.950568425807e-13 iteration 21 KSP Residual norm 1.227898091813e-13 iteration 22 KSP Residual norm 5.411106347374e-14 iteration 23 KSP Residual norm 4.511115848564e-14 iteration 24 KSP Residual norm 4.063546606691e-14 iteration 25 KSP Residual norm 3.677694771949e-14 iteration 26 KSP Residual norm 3.459244943466e-14 iteration 27 KSP Residual norm 3.263954971093e-14 iteration 28 KSP Residual norm 3.087344619079e-14 iteration 29 KSP Residual norm 2.809426925625e-14 iteration 30 KSP Residual norm 4.366149884754e-01 Linear solve did not converge due to DIVERGED_DTOL iterations 30 SNES Object: 1 MPI processes type: newtonls SNES has not been set up so information may be incomplete maximum iterations=50, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 total number of linear solver iterations=0 total number of function evaluations=0 norm schedule ALWAYS SNESLineSearch Object: 1 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: 1 MPI processes type: fieldsplit PC has not been set up so information may be incomplete FieldSplit with Schur preconditioner, factorization FULL Preconditioner for the Schur complement formed from S itself Split info: KSP solver for A00 block not yet available KSP solver for S = A11 - A10 inv(A00) A01 not yet available linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=659, cols=659 total: nonzeros=659, allocated nonzeros=7908 total number of mallocs used during MatSetValues calls=0 not using I-node routines Regards Juntao -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 8 05:25:19 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 8 Oct 2020 06:25:19 -0400 Subject: [petsc-users] Convergence Error Debugging with KSP solvers in SNES In-Reply-To: References: Message-ID: On Thu, Oct 8, 2020 at 3:43 AM Yang Juntao wrote: > Hello, > > > > I?m working on a nonlinear solver with SNES with handcoded jacobian and > function. Each linear solver is solved with KSP solver. > > But sometimes I got issues with ksp solver convergence. I tried with > finite difference approximated jacobian, but get the same error. > > > > From the iterations, the convergence seems ok at the beginning but > suddenly diverged in the last iteration. > > Hope anyone with experience on ksp solvers could direct me to a direction > I can debug the problem. > KSP Object: 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement The GMRES restarted at iteration 30. You can increase the subspace size using -ksp_gmres_restart 100 Thanks, Matt > iter = 0, SNES Function norm 2.94934e-06 > > iteration 0 KSP Residual norm 1.094600281831e-06 > > iteration 1 KSP Residual norm 1.264284474186e-08 > > iteration 2 KSP Residual norm 6.593269221816e-09 > > iteration 3 KSP Residual norm 1.689570779457e-09 > > iteration 4 KSP Residual norm 1.040661505932e-09 > > iteration 5 KSP Residual norm 5.422761817348e-10 > > iteration 6 KSP Residual norm 2.492867371369e-10 > > iteration 7 KSP Residual norm 8.261522376775e-11 > > iteration 8 KSP Residual norm 4.246401544245e-11 > > iteration 9 KSP Residual norm 2.514366787388e-11 > > iteration 10 KSP Residual norm 1.982940267051e-11 > > iteration 11 KSP Residual norm 1.586470414676e-11 > > iteration 12 KSP Residual norm 9.866392216207e-12 > > iteration 13 KSP Residual norm 4.951342176999e-12 > > iteration 14 KSP Residual norm 2.418292660318e-12 > > iteration 15 KSP Residual norm 1.747418526086e-12 > > iteration 16 KSP Residual norm 1.094150535809e-12 > > iteration 17 KSP Residual norm 4.464287492066e-13 > > iteration 18 KSP Residual norm 3.530090494462e-13 > > iteration 19 KSP Residual norm 2.825698091454e-13 > > iteration 20 KSP Residual norm 1.950568425807e-13 > > iteration 21 KSP Residual norm 1.227898091813e-13 > > iteration 22 KSP Residual norm 5.411106347374e-14 > > iteration 23 KSP Residual norm 4.511115848564e-14 > > iteration 24 KSP Residual norm 4.063546606691e-14 > > iteration 25 KSP Residual norm 3.677694771949e-14 > > iteration 26 KSP Residual norm 3.459244943466e-14 > > iteration 27 KSP Residual norm 3.263954971093e-14 > > iteration 28 KSP Residual norm 3.087344619079e-14 > > iteration 29 KSP Residual norm 2.809426925625e-14 > > iteration 30 KSP Residual norm 4.366149884754e-01 > > Linear solve did not converge due to DIVERGED_DTOL iterations 30 > > > > > > SNES Object: 1 MPI processes > > type: newtonls > > SNES has not been set up so information may be incomplete > > maximum iterations=50, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > > total number of linear solver iterations=0 > > total number of function evaluations=0 > > norm schedule ALWAYS > > SNESLineSearch Object: 1 MPI processes > > type: bt > > interpolation: cubic > > alpha=1.000000e-04 > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > > left preconditioning > > using DEFAULT norm type for convergence test > > PC Object: 1 MPI processes > > type: fieldsplit > > PC has not been set up so information may be incomplete > > FieldSplit with Schur preconditioner, factorization FULL > > Preconditioner for the Schur complement formed from S itself > > Split info: > > KSP solver for A00 block > > not yet available > > KSP solver for S = A11 - A10 inv(A00) A01 > > not yet available > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=659, cols=659 > > total: nonzeros=659, allocated nonzeros=7908 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > > > Regards > > Juntao > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.jamond at cea.fr Thu Oct 8 08:27:54 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Thu, 8 Oct 2020 15:27:54 +0200 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> Message-ID: <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> > Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)). > > Sp = Ct*C + Qt * S * Q = Ct*C + [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] > > [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks. > > You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants. You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD. At this time, I didn't manage to compute iD=inv(C*Ct) without using dense matrices, what may be a shame because all matrices are sparse . Is it possible? And I get no idea of how to write code to manually zip through the diagonal blocks of D to invert them... Thanks for helping! From bsmith at petsc.dev Thu Oct 8 13:17:46 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 8 Oct 2020 13:17:46 -0500 Subject: [petsc-users] Convergence Error Debugging with KSP solvers in SNES In-Reply-To: References: Message-ID: <17D7396D-BA2D-4E7A-859A-82F605E6D12D@petsc.dev> When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner. How are you doing the matrix vector product? Note both the operator and preconditioner must be linear operators for GMRES. FGMRES allows the preconditioner to be nonlinear. You can try -ksp_type fgmres -ksp_monitor_true_residual Barry > On Oct 8, 2020, at 2:43 AM, Yang Juntao wrote: > > Hello, > > I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver. > But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error. > > From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration. > Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem. > > iter = 0, SNES Function norm 2.94934e-06 > iteration 0 KSP Residual norm 1.094600281831e-06 > iteration 1 KSP Residual norm 1.264284474186e-08 > iteration 2 KSP Residual norm 6.593269221816e-09 > iteration 3 KSP Residual norm 1.689570779457e-09 > iteration 4 KSP Residual norm 1.040661505932e-09 > iteration 5 KSP Residual norm 5.422761817348e-10 > iteration 6 KSP Residual norm 2.492867371369e-10 > iteration 7 KSP Residual norm 8.261522376775e-11 > iteration 8 KSP Residual norm 4.246401544245e-11 > iteration 9 KSP Residual norm 2.514366787388e-11 > iteration 10 KSP Residual norm 1.982940267051e-11 > iteration 11 KSP Residual norm 1.586470414676e-11 > iteration 12 KSP Residual norm 9.866392216207e-12 > iteration 13 KSP Residual norm 4.951342176999e-12 > iteration 14 KSP Residual norm 2.418292660318e-12 > iteration 15 KSP Residual norm 1.747418526086e-12 > iteration 16 KSP Residual norm 1.094150535809e-12 > iteration 17 KSP Residual norm 4.464287492066e-13 > iteration 18 KSP Residual norm 3.530090494462e-13 > iteration 19 KSP Residual norm 2.825698091454e-13 > iteration 20 KSP Residual norm 1.950568425807e-13 > iteration 21 KSP Residual norm 1.227898091813e-13 > iteration 22 KSP Residual norm 5.411106347374e-14 > iteration 23 KSP Residual norm 4.511115848564e-14 > iteration 24 KSP Residual norm 4.063546606691e-14 > iteration 25 KSP Residual norm 3.677694771949e-14 > iteration 26 KSP Residual norm 3.459244943466e-14 > iteration 27 KSP Residual norm 3.263954971093e-14 > iteration 28 KSP Residual norm 3.087344619079e-14 > iteration 29 KSP Residual norm 2.809426925625e-14 > iteration 30 KSP Residual norm 4.366149884754e-01 > Linear solve did not converge due to DIVERGED_DTOL iterations 30 > > > SNES Object: 1 MPI processes > type: newtonls > SNES has not been set up so information may be incomplete > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=0 > total number of function evaluations=0 > norm schedule ALWAYS > SNESLineSearch Object: 1 MPI processes > type: bt > interpolation: cubic > alpha=1.000000e-04 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=40 > KSP Object: 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > PC has not been set up so information may be incomplete > FieldSplit with Schur preconditioner, factorization FULL > Preconditioner for the Schur complement formed from S itself > Split info: > KSP solver for A00 block > not yet available > KSP solver for S = A11 - A10 inv(A00) A01 > not yet available > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=659, cols=659 > total: nonzeros=659, allocated nonzeros=7908 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > > Regards > Juntao -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Oct 8 15:07:12 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 08 Oct 2020 14:07:12 -0600 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> Message-ID: <87tuv48osv.fsf@jedbrown.org> Olivier Jamond writes: >> Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)). >> >> Sp = Ct*C + Qt * S * Q = Ct*C + [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] >> >> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks. >> >> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants. You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD. > > At this time, I didn't manage to compute iD=inv(C*Ct) without using > dense matrices, what may be a shame because all matrices are sparse . Is > it possible? > > And I get no idea of how to write code to manually zip through the > diagonal blocks of D to invert them... You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array. If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat. From bsmith at petsc.dev Thu Oct 8 15:59:51 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 8 Oct 2020 15:59:51 -0500 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <87tuv48osv.fsf@jedbrown.org> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> <87tuv48osv.fsf@jedbrown.org> Message-ID: <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev> Olivier I am working on extending the routines now and hopefully push a branch you can try fairly soon. Barry > On Oct 8, 2020, at 3:07 PM, Jed Brown wrote: > > Olivier Jamond writes: > >>> Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)). >>> >>> Sp = Ct*C + Qt * S * Q = Ct*C + [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] >>> >>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks. >>> >>> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants. You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD. >> >> At this time, I didn't manage to compute iD=inv(C*Ct) without using >> dense matrices, what may be a shame because all matrices are sparse . Is >> it possible? >> >> And I get no idea of how to write code to manually zip through the >> diagonal blocks of D to invert them... > > You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array. > > If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat. From bsmith at petsc.dev Thu Oct 8 20:50:00 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 8 Oct 2020 20:50:00 -0500 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> <87tuv48osv.fsf@jedbrown.org> <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev> Message-ID: <218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev> Olivier, The branch barry/2020-10-08/invert-block-diagonal-aij contains an example src/mat/tests/ex178.c that shows how to compute inv(CC'). It works for SeqAIJ matrices. Please let us know if it works for you and then I will implement the parallel version. Barry > On Oct 8, 2020, at 3:59 PM, Barry Smith wrote: > > > Olivier > > I am working on extending the routines now and hopefully push a branch you can try fairly soon. > > Barry > > >> On Oct 8, 2020, at 3:07 PM, Jed Brown wrote: >> >> Olivier Jamond writes: >> >>>> Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)). >>>> >>>> Sp = Ct*C + Qt * S * Q = Ct*C + [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] >>>> >>>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks. >>>> >>>> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants. You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD. >>> >>> At this time, I didn't manage to compute iD=inv(C*Ct) without using >>> dense matrices, what may be a shame because all matrices are sparse . Is >>> it possible? >>> >>> And I get no idea of how to write code to manually zip through the >>> diagonal blocks of D to invert them... >> >> You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array. >> >> If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Oct 8 21:31:12 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 8 Oct 2020 21:31:12 -0500 Subject: [petsc-users] [petsc-maint] Poisson with Neumann BC on a stretched grid In-Reply-To: <260f02c6-64e1-e12e-82e7-f4ea7a155ca1@uclouvain.be> References: <982e1f8e-8b74-8b69-c7ab-fa7eeec650a6@uclouvain.be> <260f02c6-64e1-e12e-82e7-f4ea7a155ca1@uclouvain.be> Message-ID: <35425AFC-F019-4704-939C-7AD05B358DE4@petsc.dev> > On Oct 8, 2020, at 2:50 AM, Victoria Hamtiaux wrote: > > I'm sorry but I'm a bit confused. > > > > First, the fact that the matrix is not symmetric is ok? > > > > Secondly, I guess that coding the semi-coarsening would be the "best" solution, but isn't there any "easier" solution to solve that linear system (Poisson equation with pure Neumann BC on a stretched grid (and in parallel))? > > > > Also, is it normal that using the PCLU preconditionner (and solving on 1 processor) with a stretched grid, the solution of the linear solver is bad. Is it possible that the PCLU preconditionner also has problems with stretched grids? (Because again, with a uniform grid, it works fine). > No this is not normal. I would expect PCLU to be very robust for stretched grids. With finite differences I think a stretched grid results in a non symmetric matrix because the differencing (to the right and left for example) uses a different h. I think this is normal, with traditional finite elements I think it will remain symmetric. Also with finite differencing the order of the accuracy of the discretization falls because the terms in the Taylor series that cancel with a non-stretched grid do not cancel with a stretched grid. Is it possible that something is wrong with generation of the matrix with the stretched grid? You could try a convergence study with a slightly stretched grid using PCLU to see if it seems to be working correctly. Only when the numerics are right and you want to run a large problem where PCLU is too slow you would switch to multgrid with semi-coarsening. I think it is pretty easy for a structured grid and we can show you how and maybe get a nice example for PETSc out of it. Barry > > Sorry for asking so much questions, > > > > Thanks again for your help, > > > > Best regards, > > > > Victoria > > > > > > On 7/10/20 17:59, Barry Smith wrote: >> >> >>> On Oct 7, 2020, at 8:11 AM, Mark Adams > wrote: >>> >>> >>> On Wed, Oct 7, 2020 at 8:27 AM Victoria Hamtiaux > wrote: >>> Thanks for all the answers, >>> >>> >>> >>> How can I do the "semi-coarsening"? I don't really know how those preconditionners work so I don't how how to change them or so.. >>> >>> >>> >>> You have to write custom code to do semi-coarsening. PETSc does not provide that and you would not want to do it yourself, most likely. >> >> We do not provide it directly but if you are using PCMG and DMDA it is relatively straight-forward. You create a coarse DM and then refine it but each refinement you only do in the directions you want set each time with DMDASetRefinementFactor(). Once you have the collections of refined DM's you provide them to PCMG. >> >> Barry >> >>> >>> >>> I have a question because you both seem to say that my matrix is supposed to be symmetric which is not the case. \ >>> >>> You said "my matrix is symmetric." >>> >>> Then you said " I suspect that by stretching the grid, my matrix is not symmetric anymore and that it might cause a problem." >>> >>> We are saying that by stretchin the grid the matrix is still symmetric even if the grid has lost a symmetry. I don't know of a mechanism for stretching the grid to make the matrix asymmetric. So we are suggesting that you verify your suspicion that the matrix is symmetric. >>> >>> And in fact, I don't get how it can be symmetric. Because you will have something close to symmetric. For example when you are at the center of your domain it will be symmetric, but when your at a point at the boundaries I don't get how you can be symmetric, you won't have something at the left and the right of your main diagonal... (I don't know if my explanations are understandable) >>> >>> You can make a discretization that is not symmetric because of boundary conditions but I assume that is not the case because you said your matrix is symmetric. >>> >>> Best regards, >>> >>> >>> >>> Victoria >>> >>> >>> >>> >>> >>> On 7/10/20 14:20, Mark Adams wrote: >>>> GMG (geometric MG) is stronger as Matt said, but it is affected by stretched grids in a similar way. A way to fix this in GMG is semi-coarsening, which AMG _can_ do automatically. >>>> >>>> On Wed, Oct 7, 2020 at 8:02 AM Matthew Knepley > wrote: >>>> On Wed, Oct 7, 2020 at 7:07 AM Victoria Hamtiaux > wrote: >>>> Hello Matt, >>>> >>>> >>>> >>>> I just checked the symmetry of my matrix and it is not symmetric. But it is not symmetric either when I use a uniform grid. >>>> >>>> The domain is 3D and I'm using finite differences, so I guess it is normal that at multiple places (when I deal with points near the boundaries), the matrix is not symmetric. >>>> >>>> So I was wrong, the problem doesn't come from the fact that the matrix is not symmetric. I don't know where it comes from, but when I switch from uniform to stretched grid, the solver stops working properly. Could it be from the preconditionner of the solver that I use? >>>> >>>> Do you have any other idea ? >>>> >>>> I would consider using GMG. As Mark says, AMG is very fiddly with stretched grids. For Poisson, GMG works great and you seem to have regular grids. >>>> >>>> Thanks, >>>> >>>> Matt >>>> Thanks for your help, >>>> >>>> >>>> >>>> Victoria >>>> >>>> >>>> >>>> On 7/10/20 12:48, Matthew Knepley wrote: >>>>> On Wed, Oct 7, 2020 at 6:40 AM Victoria Hamtiaux > wrote: >>>>> Dear all, >>>>> >>>>> >>>>> After the discretization of a poisson equation with purely Neumann (or >>>>> periodic) boundary conditions, I get a matrix which is singular. >>>>> >>>>> >>>>> The way I am handling this is by using a NullSpace with the following >>>>> code : >>>>> >>>>> MatNullSpace nullspace; >>>>> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, 0, &nullspace); >>>>> MatSetNullSpace(p_solverp->A, nullspace); >>>>> MatSetTransposeNullSpace(p_solverp->A, nullspace); >>>>> MatNullSpaceDestroy(&nullspace); >>>>> >>>>> >>>>> Note that I am using the hypre preconditionner BOOMERANG and the default >>>>> solver GMRES. >>>>> >>>>> >>>>> KSPCreate(PETSC_COMM_WORLD,&p_solverp->ksp); >>>>> KSPSetOperators(p_solverp->ksp, p_solverp->A, p_solverp->A); >>>>> PC prec; >>>>> KSPGetPC(p_solverp->ksp, &prec); >>>>> PCSetType(prec,PCHYPRE);//PCHYPRE seems the best >>>>> PCHYPRESetType(prec,"boomeramg"); //boomeramg is the best >>>>> KSPSetInitialGuessNonzero(p_solverp->ksp,PETSC_TRUE); >>>>> KSPSetFromOptions(p_solverp->ksp); >>>>> KSPSetTolerances(p_solverp->ksp, 1.e-10, 1e-10, PETSC_DEFAULT, >>>>> PETSC_DEFAULT); >>>>> KSPSetReusePreconditioner(p_solverp->ksp,PETSC_TRUE); >>>>> KSPSetUseFischerGuess(p_solverp->ksp,1,5); >>>>> KSPGMRESSetPreAllocateVectors(p_solverp->ksp); >>>>> KSPSetUp(p_solverp->ksp); >>>>> >>>>> >>>>> >>>>> And this works fine when my grid is uniform, so that my matrix is >>>>> symmetric. >>>>> >>>>> >>>>> But when I stretch the grid near the boundary (my grid is then >>>>> non-uniform), it doesn't work properly anymore. I suspect that by >>>>> stretching the grid, my matrix is not symmetric anymore and that it >>>>> might cause a problem. >>>>> >>>>> Symmetry is a property of the operator, so you should be symmetric on your >>>>> stretched grid. If not, I think you have the discretization wrong. You can check >>>>> symmetry using >>>>> >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatIsSymmetric.html >>>>> >>>>> Also, if you suspect your discretization, you should probably do an MMS test to >>>>> verify that you discretization converges at the correct rate. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> I tried fixing the solution at an arbitrary point, but sometimes doing >>>>> this, I get errors near that fixed point. I 've seen on the petsc-users >>>>> forum that you usually don't recommend to fix a point, but I don't >>>>> really know how to proceed differently. >>>>> >>>>> >>>>> What would you recommend to solve this problem? >>>>> >>>>> >>>>> Thanks for your help, >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> >>>>> Victoria >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From y.juntao at hotmail.com Fri Oct 9 00:53:28 2020 From: y.juntao at hotmail.com (Karl Yang) Date: Fri, 9 Oct 2020 13:53:28 +0800 Subject: [petsc-users] Convergence Error Debugging with KSP solvers in SNES In-Reply-To: <17D7396D-BA2D-4E7A-859A-82F605E6D12D@petsc.dev> References: <17D7396D-BA2D-4E7A-859A-82F605E6D12D@petsc.dev> Message-ID: Hi, Barry, Thanks for your reply. Yes, I should have used fgmres. But after switching to fgmres I'm still facing the same convergence issue. Seems like the reason is due to DIVERGED_PC_FAILED. But I simply used FD jacobian, and fieldsplitPC. I am a bit lost on whether I made some mistakes somewhere in the FormFunction or I did not setup the solver correctly. ///////code/////// SNESSetFunction(snes, r, FormFunctionStatic, this); // SNESSetJacobian(snes, J, J, FormJacobianStatic, this); SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault, this); SNESMonitorSet(snes, MySNESMonitor, NULL, NULL); SNESGetKSP(snes, &ksp); KSPGetPC(ksp, &pc); PCSetType(pc, PCFIELDSPLIT); PCFieldSplitSetDetectSaddlePoint(pc, PETSC_TRUE); PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELF, NULL); KSPMonitorSet(ksp, MyKSPMonitor, NULL, 0); KSPSetTolerances(ksp, 1e-8, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); SNESSetFromOptions(snes); //////end///////// Output from SNES/KSP solver ################# step 1 ################# iter = 0, SNES Function norm 0.0430713 iteration 0 KSP Residual norm 4.307133784528e-02 0 KSP unpreconditioned resid norm 4.307133784528e-02 true resid norm 4.307133784528e-02 ||r(i)||/||b|| 1.000000000000e+00 iteration 1 KSP Residual norm 4.451434065870e-07 1 KSP unpreconditioned resid norm 4.451434065870e-07 true resid norm 4.451434065902e-07 ||r(i)||/||b|| 1.033502623460e-05 iteration 2 KSP Residual norm 1.079756105012e-12 2 KSP unpreconditioned resid norm 1.079756105012e-12 true resid norm 1.079754870815e-12 ||r(i)||/||b|| 2.506898844643e-11 Linear solve converged due to CONVERGED_RTOL iterations 2 iter = 1, SNES Function norm 2.40846e-05 iteration 0 KSP Residual norm 2.408462930023e-05 0 KSP unpreconditioned resid norm 2.408462930023e-05 true resid norm 2.408462930023e-05 ||r(i)||/||b|| 1.000000000000e+00 iteration 1 KSP Residual norm 1.096958085415e-11 1 KSP unpreconditioned resid norm 1.096958085415e-11 true resid norm 1.096958085425e-11 ||r(i)||/||b|| 4.554598170270e-07 iteration 2 KSP Residual norm 5.909523288165e-16 2 KSP unpreconditioned resid norm 5.909523288165e-16 true resid norm 5.909519599233e-16 ||r(i)||/||b|| 2.453647729249e-11 Linear solve converged due to CONVERGED_RTOL iterations 2 iter = 2, SNES Function norm 1.19684e-14 ################# step 2 ################# iter = 0, SNES Function norm 0.00391662 iteration 0 KSP Residual norm 3.916615614134e-03 0 KSP unpreconditioned resid norm 3.916615614134e-03 true resid norm 3.916615614134e-03 ||r(i)||/||b|| 1.000000000000e+00 iteration 1 KSP Residual norm 4.068800385009e-08 1 KSP unpreconditioned resid norm 4.068800385009e-08 true resid norm 4.068800384986e-08 ||r(i)||/||b|| 1.038856192653e-05 iteration 2 KSP Residual norm 8.427513055511e-14 2 KSP unpreconditioned resid norm 8.427513055511e-14 true resid norm 8.427497502034e-14 ||r(i)||/||b|| 2.151729537007e-11 Linear solve converged due to CONVERGED_RTOL iterations 2 iter = 1, SNES Function norm 1.99152e-07 iteration 0 KSP Residual norm 1.991523558528e-07 0 KSP unpreconditioned resid norm 1.991523558528e-07 true resid norm 1.991523558528e-07 ||r(i)||/||b|| 1.000000000000e+00 iteration 1 KSP Residual norm 1.413505562549e-13 1 KSP unpreconditioned resid norm 1.413505562549e-13 true resid norm 1.413505562550e-13 ||r(i)||/||b|| 7.097609046588e-07 iteration 2 KSP Residual norm 5.165934822520e-18 2 KSP unpreconditioned resid norm 5.165934822520e-18 true resid norm 5.165932973227e-18 ||r(i)||/||b|| 2.593960262787e-11 Linear solve converged due to CONVERGED_RTOL iterations 2 iter = 2, SNES Function norm 1.69561e-16 ################# step 3 ################# iter = 0, SNES Function norm 0.00035615 iteration 0 KSP Residual norm 3.561504844171e-04 0 KSP unpreconditioned resid norm 3.561504844171e-04 true resid norm 3.561504844171e-04 ||r(i)||/||b|| 1.000000000000e+00 iteration 1 KSP Residual norm 3.701591890269e-09 1 KSP unpreconditioned resid norm 3.701591890269e-09 true resid norm 3.701591890274e-09 ||r(i)||/||b|| 1.039333667153e-05 iteration 2 KSP Residual norm 7.832821034843e-15 2 KSP unpreconditioned resid norm 7.832821034843e-15 true resid norm 7.832856926692e-15 ||r(i)||/||b|| 2.199311041093e-11 Linear solve converged due to CONVERGED_RTOL iterations 2 iter = 1, SNES Function norm 1.64671e-09 iteration 0 KSP Residual norm 1.646709543241e-09 0 KSP unpreconditioned resid norm 1.646709543241e-09 true resid norm 1.646709543241e-09 ||r(i)||/||b|| 1.000000000000e+00 iteration 1 KSP Residual norm 1.043230469512e-15 1 KSP unpreconditioned resid norm 1.043230469512e-15 true resid norm 1.043230469512e-15 ||r(i)||/||b|| 6.335242749968e-07 iteration 1 KSP Residual norm 0.000000000000e+00 1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan Linear solve did not converge due to DIVERGED_PC_FAILED iterations 1 PC_FAILED due to SUBPC_ERROR More information from -ksp_error_if_not_converged -info [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.303168180659e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 7.795816360977e-02 at iteration 12 [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.227610512466e+00 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 5.453050347652e+05 at iteration 12 [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged [0] KSPConvergedDefault(): Linear solver is diverging. Initial right hand size norm 9.501675075823e-01, current residual norm 4.894880836662e+04 at iteration 210 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0]PETSC ERROR: #3 PCApply() line 444 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #4 KSP_PCApply() line 281 in /usr/local/petsc/petsc-3.12.5/include/petsc/private/kspimpl.h [0]PETSC ERROR: #5 KSPFGMRESCycle() line 166 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c [0]PETSC ERROR: #6 KSPSolve_FGMRES() line 291 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c [0]PETSC ERROR: #7 KSPSolve() line 760 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /usr/local/petsc/petsc-3.12.5/src/snes/impls/ls/ls.c [0]PETSC ERROR: #9 SNESSolve() line 4482 in /usr/local/petsc/petsc-3.12.5/src/snes/interface/snes.c [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783 SNES Object: 1 MPI processes type: newtonls maximum iterations=50, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 total number of linear solver iterations=2 total number of function evaluations=1322 norm schedule ALWAYS Jacobian is built using finite differences one column at a time SNESLineSearch Object: 1 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, blocksize = 1, factorization FULL Preconditioner for the Schur complement formed from S itself Split info: Split number 0 Defined by IS Split number 1 Defined by IS KSP solver for A00 block KSP Object: (fieldsplit_0_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=9213, allocated nonzeros=9213 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=9213, allocated nonzeros=9213 total number of mallocs used during MatSetValues calls=0 not using I-node routines KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_1_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_1_) 1 MPI processes type: none linear system matrix = precond matrix: Mat Object: (fieldsplit_1_) 1 MPI processes type: schurcomplement rows=147, cols=147 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: 1 MPI processes type: seqaij rows=147, cols=147 total: nonzeros=147, allocated nonzeros=147 total number of mallocs used during MatSetValues calls=0 not using I-node routines A10 Mat Object: 1 MPI processes type: seqaij rows=147, cols=512 total: nonzeros=2560, allocated nonzeros=2560 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 87 nodes, limit used is 5 KSP of A00 KSP Object: (fieldsplit_0_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 package used to perform factorization: petsc total: nonzeros=9213, allocated nonzeros=9213 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=512, cols=512 total: nonzeros=9213, allocated nonzeros=9213 total number of mallocs used during MatSetValues calls=0 not using I-node routines A01 Mat Object: 1 MPI processes type: seqaij rows=512, cols=147 total: nonzeros=2562, allocated nonzeros=2562 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=659, cols=659 total: nonzeros=14482, allocated nonzeros=27543 total number of mallocs used during MatSetValues calls=1309 not using I-node routines On Oct 9 2020, at 2:17 am, Barry Smith wrote: > > When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner. > > How are you doing the matrix vector product? Note both the operator and preconditioner must be linear operators for GMRES. > > FGMRES allows the preconditioner to be nonlinear. You can try > > -ksp_type fgmres -ksp_monitor_true_residual > > Barry > > > > On Oct 8, 2020, at 2:43 AM, Yang Juntao wrote: > > Hello, > > > > I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver. > > But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error. > > > > From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration. > > Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem. > > > > iter = 0, SNES Function norm 2.94934e-06 > > iteration 0 KSP Residual norm 1.094600281831e-06 > > iteration 1 KSP Residual norm 1.264284474186e-08 > > iteration 2 KSP Residual norm 6.593269221816e-09 > > iteration 3 KSP Residual norm 1.689570779457e-09 > > iteration 4 KSP Residual norm 1.040661505932e-09 > > iteration 5 KSP Residual norm 5.422761817348e-10 > > iteration 6 KSP Residual norm 2.492867371369e-10 > > iteration 7 KSP Residual norm 8.261522376775e-11 > > iteration 8 KSP Residual norm 4.246401544245e-11 > > iteration 9 KSP Residual norm 2.514366787388e-11 > > iteration 10 KSP Residual norm 1.982940267051e-11 > > iteration 11 KSP Residual norm 1.586470414676e-11 > > iteration 12 KSP Residual norm 9.866392216207e-12 > > iteration 13 KSP Residual norm 4.951342176999e-12 > > iteration 14 KSP Residual norm 2.418292660318e-12 > > iteration 15 KSP Residual norm 1.747418526086e-12 > > iteration 16 KSP Residual norm 1.094150535809e-12 > > iteration 17 KSP Residual norm 4.464287492066e-13 > > iteration 18 KSP Residual norm 3.530090494462e-13 > > iteration 19 KSP Residual norm 2.825698091454e-13 > > iteration 20 KSP Residual norm 1.950568425807e-13 > > iteration 21 KSP Residual norm 1.227898091813e-13 > > iteration 22 KSP Residual norm 5.411106347374e-14 > > iteration 23 KSP Residual norm 4.511115848564e-14 > > iteration 24 KSP Residual norm 4.063546606691e-14 > > iteration 25 KSP Residual norm 3.677694771949e-14 > > iteration 26 KSP Residual norm 3.459244943466e-14 > > iteration 27 KSP Residual norm 3.263954971093e-14 > > iteration 28 KSP Residual norm 3.087344619079e-14 > > iteration 29 KSP Residual norm 2.809426925625e-14 > > iteration 30 KSP Residual norm 4.366149884754e-01 > > Linear solve did not converge due to DIVERGED_DTOL iterations 30 > > > > > > > > SNES Object: 1 MPI processes > > type: newtonls > > SNES has not been set up so information may be incomplete > > maximum iterations=50, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > > total number of linear solver iterations=0 > > total number of function evaluations=0 > > norm schedule ALWAYS > > SNESLineSearch Object: 1 MPI processes > > type: bt > > interpolation: cubic > > alpha=1.000000e-04 > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > > left preconditioning > > using DEFAULT norm type for convergence test > > PC Object: 1 MPI processes > > type: fieldsplit > > PC has not been set up so information may be incomplete > > FieldSplit with Schur preconditioner, factorization FULL > > Preconditioner for the Schur complement formed from S itself > > Split info: > > KSP solver for A00 block > > not yet available > > KSP solver for S = A11 - A10 inv(A00) A01 > > not yet available > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=659, cols=659 > > total: nonzeros=659, allocated nonzeros=7908 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > > > Regards > > Juntao > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 9 01:25:07 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 9 Oct 2020 01:25:07 -0500 Subject: [petsc-users] Convergence Error Debugging with KSP solvers in SNES In-Reply-To: References: <17D7396D-BA2D-4E7A-859A-82F605E6D12D@petsc.dev> Message-ID: <9AA79324-0653-4088-8A1E-72FEE8CD1631@petsc.dev> Before you do any investigation I would run with one SNES solve -snes_max_it 1 -snes_view It will print out exactly what solver configuration you are using. ------ [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c Ok, one of the inner solvers went crazy and diverged, that is the norm of the residual for that inner solver exploded. Base on the line number [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c you can look at that file and see which of the inner solvers failed. From the info from -snes_view you will know what the KSP and PC is for that inner solve and the KSP options prefix. With that you can run the failing case with the addition option -xxx_ksp_monitor_true_residual and watch the inner solve explode. This inner solver behaving badly can also explain the need for -ksp_type fgmres. Normally PCFIELDSPLIT is a linear operator and so you can use -ksp_type gmres but there is some issue with the inner solver. Could it possible have a null space, that is be singular? Do you provide your own custom inner solver or just select from the options database? Does -pc_type lu make everything work fine? Barry > On Oct 9, 2020, at 12:53 AM, Karl Yang wrote: > > Hi, Barry, > > Thanks for your reply. Yes, I should have used fgmres. But after switching to fgmres I'm still facing the same convergence issue. > > Seems like the reason is due to DIVERGED_PC_FAILED. But I simply used FD jacobian, and fieldsplitPC. I am a bit lost on whether I made some mistakes somewhere in the FormFunction or I did not setup the solver correctly. > > ///////code/////// > SNESSetFunction(snes, r, FormFunctionStatic, this); > // SNESSetJacobian(snes, J, J, FormJacobianStatic, this); > SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault, this); > SNESMonitorSet(snes, MySNESMonitor, NULL, NULL); > > SNESGetKSP(snes, &ksp); > KSPGetPC(ksp, &pc); > PCSetType(pc, PCFIELDSPLIT); > PCFieldSplitSetDetectSaddlePoint(pc, PETSC_TRUE); > PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELF, NULL); > KSPMonitorSet(ksp, MyKSPMonitor, NULL, 0); > KSPSetTolerances(ksp, 1e-8, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); > SNESSetFromOptions(snes); > //////end///////// > > Output from SNES/KSP solver > ################# step 1 ################# > iter = 0, SNES Function norm 0.0430713 > iteration 0 KSP Residual norm 4.307133784528e-02 > 0 KSP unpreconditioned resid norm 4.307133784528e-02 true resid norm 4.307133784528e-02 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 4.451434065870e-07 > 1 KSP unpreconditioned resid norm 4.451434065870e-07 true resid norm 4.451434065902e-07 ||r(i)||/||b|| 1.033502623460e-05 > iteration 2 KSP Residual norm 1.079756105012e-12 > 2 KSP unpreconditioned resid norm 1.079756105012e-12 true resid norm 1.079754870815e-12 ||r(i)||/||b|| 2.506898844643e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 1, SNES Function norm 2.40846e-05 > iteration 0 KSP Residual norm 2.408462930023e-05 > 0 KSP unpreconditioned resid norm 2.408462930023e-05 true resid norm 2.408462930023e-05 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 1.096958085415e-11 > 1 KSP unpreconditioned resid norm 1.096958085415e-11 true resid norm 1.096958085425e-11 ||r(i)||/||b|| 4.554598170270e-07 > iteration 2 KSP Residual norm 5.909523288165e-16 > 2 KSP unpreconditioned resid norm 5.909523288165e-16 true resid norm 5.909519599233e-16 ||r(i)||/||b|| 2.453647729249e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 2, SNES Function norm 1.19684e-14 > ################# step 2 ################# > iter = 0, SNES Function norm 0.00391662 > iteration 0 KSP Residual norm 3.916615614134e-03 > 0 KSP unpreconditioned resid norm 3.916615614134e-03 true resid norm 3.916615614134e-03 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 4.068800385009e-08 > 1 KSP unpreconditioned resid norm 4.068800385009e-08 true resid norm 4.068800384986e-08 ||r(i)||/||b|| 1.038856192653e-05 > iteration 2 KSP Residual norm 8.427513055511e-14 > 2 KSP unpreconditioned resid norm 8.427513055511e-14 true resid norm 8.427497502034e-14 ||r(i)||/||b|| 2.151729537007e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 1, SNES Function norm 1.99152e-07 > iteration 0 KSP Residual norm 1.991523558528e-07 > 0 KSP unpreconditioned resid norm 1.991523558528e-07 true resid norm 1.991523558528e-07 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 1.413505562549e-13 > 1 KSP unpreconditioned resid norm 1.413505562549e-13 true resid norm 1.413505562550e-13 ||r(i)||/||b|| 7.097609046588e-07 > iteration 2 KSP Residual norm 5.165934822520e-18 > 2 KSP unpreconditioned resid norm 5.165934822520e-18 true resid norm 5.165932973227e-18 ||r(i)||/||b|| 2.593960262787e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 2, SNES Function norm 1.69561e-16 > ################# step 3 ################# > iter = 0, SNES Function norm 0.00035615 > iteration 0 KSP Residual norm 3.561504844171e-04 > 0 KSP unpreconditioned resid norm 3.561504844171e-04 true resid norm 3.561504844171e-04 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 3.701591890269e-09 > 1 KSP unpreconditioned resid norm 3.701591890269e-09 true resid norm 3.701591890274e-09 ||r(i)||/||b|| 1.039333667153e-05 > iteration 2 KSP Residual norm 7.832821034843e-15 > 2 KSP unpreconditioned resid norm 7.832821034843e-15 true resid norm 7.832856926692e-15 ||r(i)||/||b|| 2.199311041093e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 1, SNES Function norm 1.64671e-09 > iteration 0 KSP Residual norm 1.646709543241e-09 > 0 KSP unpreconditioned resid norm 1.646709543241e-09 true resid norm 1.646709543241e-09 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 1.043230469512e-15 > 1 KSP unpreconditioned resid norm 1.043230469512e-15 true resid norm 1.043230469512e-15 ||r(i)||/||b|| 6.335242749968e-07 > iteration 1 KSP Residual norm 0.000000000000e+00 > 1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan > Linear solve did not converge due to DIVERGED_PC_FAILED iterations 1 > PC_FAILED due to SUBPC_ERROR > > > More information from -ksp_error_if_not_converged -info > > [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.303168180659e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 7.795816360977e-02 at iteration 12 > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.227610512466e+00 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 5.453050347652e+05 at iteration 12 > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] KSPConvergedDefault(): Linear solver is diverging. Initial right hand size norm 9.501675075823e-01, current residual norm 4.894880836662e+04 at iteration 210 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 > [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda > [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [0]PETSC ERROR: #3 PCApply() line 444 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #4 KSP_PCApply() line 281 in /usr/local/petsc/petsc-3.12.5/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #5 KSPFGMRESCycle() line 166 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [0]PETSC ERROR: #6 KSPSolve_FGMRES() line 291 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [0]PETSC ERROR: #7 KSPSolve() line 760 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /usr/local/petsc/petsc-3.12.5/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #9 SNESSolve() line 4482 in /usr/local/petsc/petsc-3.12.5/src/snes/interface/snes.c > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783 > SNES Object: 1 MPI processes > type: newtonls > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=2 > total number of function evaluations=1322 > norm schedule ALWAYS > Jacobian is built using finite differences one column at a time > SNESLineSearch Object: 1 MPI processes > type: bt > interpolation: cubic > alpha=1.000000e-04 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=40 > KSP Object: 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > FieldSplit with Schur preconditioner, blocksize = 1, factorization FULL > Preconditioner for the Schur complement formed from S itself > Split info: > Split number 0 Defined by IS > Split number 1 Defined by IS > KSP solver for A00 block > KSP Object: (fieldsplit_0_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > package used to perform factorization: petsc > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > KSP solver for S = A11 - A10 inv(A00) A01 > KSP Object: (fieldsplit_1_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_1_) 1 MPI processes > type: none > linear system matrix = precond matrix: > Mat Object: (fieldsplit_1_) 1 MPI processes > type: schurcomplement > rows=147, cols=147 > Schur complement A11 - A10 inv(A00) A01 > A11 > Mat Object: 1 MPI processes > type: seqaij > rows=147, cols=147 > total: nonzeros=147, allocated nonzeros=147 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > A10 > Mat Object: 1 MPI processes > type: seqaij > rows=147, cols=512 > total: nonzeros=2560, allocated nonzeros=2560 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 87 nodes, limit used is 5 > KSP of A00 > KSP Object: (fieldsplit_0_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > package used to perform factorization: petsc > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > A01 > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=147 > total: nonzeros=2562, allocated nonzeros=2562 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=659, cols=659 > total: nonzeros=14482, allocated nonzeros=27543 > total number of mallocs used during MatSetValues calls=1309 > not using I-node routines > > > > On Oct 9 2020, at 2:17 am, Barry Smith wrote: > > When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner. > > How are you doing the matrix vector product? Note both the operator and preconditioner must be linear operators for GMRES. > > FGMRES allows the preconditioner to be nonlinear. You can try > > -ksp_type fgmres -ksp_monitor_true_residual > > Barry > > > On Oct 8, 2020, at 2:43 AM, Yang Juntao > wrote: > > Hello, > > I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver. > But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error. > > From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration. > Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem. > > iter = 0, SNES Function norm 2.94934e-06 > iteration 0 KSP Residual norm 1.094600281831e-06 > iteration 1 KSP Residual norm 1.264284474186e-08 > iteration 2 KSP Residual norm 6.593269221816e-09 > iteration 3 KSP Residual norm 1.689570779457e-09 > iteration 4 KSP Residual norm 1.040661505932e-09 > iteration 5 KSP Residual norm 5.422761817348e-10 > iteration 6 KSP Residual norm 2.492867371369e-10 > iteration 7 KSP Residual norm 8.261522376775e-11 > iteration 8 KSP Residual norm 4.246401544245e-11 > iteration 9 KSP Residual norm 2.514366787388e-11 > iteration 10 KSP Residual norm 1.982940267051e-11 > iteration 11 KSP Residual norm 1.586470414676e-11 > iteration 12 KSP Residual norm 9.866392216207e-12 > iteration 13 KSP Residual norm 4.951342176999e-12 > iteration 14 KSP Residual norm 2.418292660318e-12 > iteration 15 KSP Residual norm 1.747418526086e-12 > iteration 16 KSP Residual norm 1.094150535809e-12 > iteration 17 KSP Residual norm 4.464287492066e-13 > iteration 18 KSP Residual norm 3.530090494462e-13 > iteration 19 KSP Residual norm 2.825698091454e-13 > iteration 20 KSP Residual norm 1.950568425807e-13 > iteration 21 KSP Residual norm 1.227898091813e-13 > iteration 22 KSP Residual norm 5.411106347374e-14 > iteration 23 KSP Residual norm 4.511115848564e-14 > iteration 24 KSP Residual norm 4.063546606691e-14 > iteration 25 KSP Residual norm 3.677694771949e-14 > iteration 26 KSP Residual norm 3.459244943466e-14 > iteration 27 KSP Residual norm 3.263954971093e-14 > iteration 28 KSP Residual norm 3.087344619079e-14 > iteration 29 KSP Residual norm 2.809426925625e-14 > iteration 30 KSP Residual norm 4.366149884754e-01 > Linear solve did not converge due to DIVERGED_DTOL iterations 30 > > > SNES Object: 1 MPI processes > type: newtonls > SNES has not been set up so information may be incomplete > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=0 > total number of function evaluations=0 > norm schedule ALWAYS > SNESLineSearch Object: 1 MPI processes > type: bt > interpolation: cubic > alpha=1.000000e-04 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=40 > KSP Object: 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > PC has not been set up so information may be incomplete > FieldSplit with Schur preconditioner, factorization FULL > Preconditioner for the Schur complement formed from S itself > Split info: > KSP solver for A00 block > not yet available > KSP solver for S = A11 - A10 inv(A00) A01 > not yet available > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=659, cols=659 > total: nonzeros=659, allocated nonzeros=7908 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > > Regards > Juntao > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Fri Oct 9 03:53:34 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Fri, 9 Oct 2020 01:53:34 -0700 Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian Message-ID: Hello, I have a couple of questions regarding how SNESSetFunction,SNESSetJacobian and SNESSolve work together. I am trying to solve a nonlinear system of the form A(x)x=b(x). I am using Fortran90. The way I intend to solve the above equation is as follows: Step 1: initialize x with an initial guess Step 2: Solve using SNESSolve for (x^i, i is the iteration number, i=1,2,3...) Step 3: Calculate the update and check if it is less than tolerance Step 4: If yes, end the loop. Else the jacobian matrix and function should be updated using x^(i) and go back to step 2. The part which is a little confusing to me is in understanding how to update the jacobian matrix and the function F (= A(x)x-b(x)). 1) Should I explicitly call the subroutines Form Function and FormJacobian by using x^i as the input argument or is this automatically taken care of when I go back to step 2 and call SNESSolve? 2) If the answer to the above question is yes, I do not fully understand the role played by the functions SNESSetFunction and SNESSetJacobian. I apologize if I am not clear in my explanation. I would be glad to elaborate on any section of my question. Please let me know if you need any further information from my side. Thank you, Sincerely, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Oct 9 06:38:24 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 9 Oct 2020 07:38:24 -0400 Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian In-Reply-To: References: Message-ID: On Fri, Oct 9, 2020 at 4:53 AM baikadi pranay wrote: > Hello, > I have a couple of questions regarding how SNESSetFunction,SNESSetJacobian > and SNESSolve work together. I am trying to solve a nonlinear system of the > form A(x)x=b(x). I am using Fortran90. The way I intend to solve the above > equation is as follows: > Step 1: initialize x with an initial guess > Step 2: Solve using SNESSolve for (x^i, i is the iteration number, > i=1,2,3...) > Step 3: Calculate the update and check if it is less than tolerance > Step 4: If yes, end the loop. Else the jacobian matrix and function should > be updated using x^(i) and go back to step 2. > You are describing the Picard iteration: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html You can do this, but it will converge more slowly than Newton. We usually advise using Newton. > The part which is a little confusing to me is in understanding how to > update the jacobian matrix and the function F (= A(x)x-b(x)). > > 1) Should I explicitly call the subroutines Form Function and FormJacobian > by using x^i as the input argument or is this automatically taken care of > when I go back to step 2 and call SNESSolve? > No. SNES calls these automatically. Thanks, Matt > 2) If the answer to the above question is yes, I do not fully understand > the role played by the functions SNESSetFunction and SNESSetJacobian. > > I apologize if I am not clear in my explanation. I would be glad to > elaborate on any section of my question. Please let me know if you need any > further information from my side. > > Thank you, > Sincerely, > Pranay. > ? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 9 09:43:46 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 9 Oct 2020 09:43:46 -0500 Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian In-Reply-To: References: Message-ID: To provide the functions to the Picard iteration you call SNESSetPicard() not SNESSetFunction() and SNESSetJacobian(), you provide code to compute A(x) and b(x). Note that in the Picard iteration the matrix A(x) is NOT the Jacobian of F(x) = A(x) x - b(x). The Jacobian of F(x) is the more complicated F'(x) = A(x) + A'(x)x + b'(x) Barry > On Oct 9, 2020, at 6:38 AM, Matthew Knepley wrote: > > On Fri, Oct 9, 2020 at 4:53 AM baikadi pranay > wrote: > Hello, > I have a couple of questions regarding how SNESSetFunction,SNESSetJacobian and SNESSolve work together. I am trying to solve a nonlinear system of the form A(x)x=b(x). I am using Fortran90. The way I intend to solve the above equation is as follows: > Step 1: initialize x with an initial guess > Step 2: Solve using SNESSolve for (x^i, i is the iteration number, i=1,2,3...) > Step 3: Calculate the update and check if it is less than tolerance > Step 4: If yes, end the loop. Else the jacobian matrix and function should be updated using x^(i) and go back to step 2. > > You are describing the Picard iteration: > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html > > You can do this, but it will converge more slowly than Newton. We usually advise using Newton. > > The part which is a little confusing to me is in understanding how to update the jacobian matrix and the function F (= A(x)x-b(x)). > > 1) Should I explicitly call the subroutines Form Function and FormJacobian by using x^i as the input argument or is this automatically taken care of when I go back to step 2 and call SNESSolve? > > No. SNES calls these automatically. > > Thanks, > > Matt > > 2) If the answer to the above question is yes, I do not fully understand the role played by the functions SNESSetFunction and SNESSetJacobian. > > I apologize if I am not clear in my explanation. I would be glad to elaborate on any section of my question. Please let me know if you need any further information from my side. > > Thank you, > Sincerely, > Pranay. > ? > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 9 11:10:53 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 9 Oct 2020 11:10:53 -0500 Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian In-Reply-To: References: Message-ID: <0BA3784A-6ECD-4FEB-8E64-5F05A89559CF@petsc.dev> I'm sorry, I made a small mistake in my previous email. It is F'(x) = A(x) + A'(x)x - b'(x) not F'(x) = A(x) + A'(x)x + b'(x) > On Oct 9, 2020, at 10:40 AM, baikadi pranay wrote: > > Thank you for the response. I have one more quick question. > Is the solution of A(x)x=b(x) obtained from Newton's method the final solution or is it the solution of A(x^i)x^(i+1)=b(x^i). In other words, do I need to use the solution obtained from Newton's method to update the Jacobian, use Newton method again and repeat the process? SNESSolve() with SNESSetPicard() continues the iteration calling your routine that computes A repeatedly until the system has converged. You can control the convergence criteria with SNESSetTolerances() (see also the manual pages that page links to). You never need call your routine that computes A from your code, PETSc calls it as it needs it. Also, and I apologize for being pedantic, but using the computation of A() and SNESSetPicard() is NOT doing Newton's method, it is a different algorithm called Picard. If you want to run Newton then you need to write a routine that computes the quantity A(x) + A'(x)x + b'(x) (not necessarily by using this exact product-rule formula). Computing just A() cannot give you Newton's method. For many problems Picard is good enough so people don't bother to code F'(x) and skip Newton's method and just use Picard. For some problems the extra effort of coding F'(x) gives a Newton that converges much faster than Picard. Barry > Best Regards, > Pranay. > ? > > On Fri, Oct 9, 2020 at 7:43 AM Barry Smith > wrote: > > To provide the functions to the Picard iteration you call SNESSetPicard() not SNESSetFunction() and SNESSetJacobian(), you provide code to compute A(x) and b(x). > > Note that in the Picard iteration the matrix A(x) is NOT the Jacobian of F(x) = A(x) x - b(x). The Jacobian of F(x) is the more complicated F'(x) = A(x) + A'(x)x + b'(x) > > Barry > > >> On Oct 9, 2020, at 6:38 AM, Matthew Knepley > wrote: >> >> On Fri, Oct 9, 2020 at 4:53 AM baikadi pranay > wrote: >> Hello, >> I have a couple of questions regarding how SNESSetFunction,SNESSetJacobian and SNESSolve work together. I am trying to solve a nonlinear system of the form A(x)x=b(x). I am using Fortran90. The way I intend to solve the above equation is as follows: >> Step 1: initialize x with an initial guess >> Step 2: Solve using SNESSolve for (x^i, i is the iteration number, i=1,2,3...) >> Step 3: Calculate the update and check if it is less than tolerance >> Step 4: If yes, end the loop. Else the jacobian matrix and function should be updated using x^(i) and go back to step 2. >> >> You are describing the Picard iteration: >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html >> >> You can do this, but it will converge more slowly than Newton. We usually advise using Newton. >> >> The part which is a little confusing to me is in understanding how to update the jacobian matrix and the function F (= A(x)x-b(x)). >> >> 1) Should I explicitly call the subroutines Form Function and FormJacobian by using x^i as the input argument or is this automatically taken care of when I go back to step 2 and call SNESSolve? >> >> No. SNES calls these automatically. >> >> Thanks, >> >> Matt >> >> 2) If the answer to the above question is yes, I do not fully understand the role played by the functions SNESSetFunction and SNESSetJacobian. >> >> I apologize if I am not clear in my explanation. I would be glad to elaborate on any section of my question. Please let me know if you need any further information from my side. >> >> Thank you, >> Sincerely, >> Pranay. >> ? >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Sat Oct 10 04:31:31 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Sat, 10 Oct 2020 02:31:31 -0700 Subject: [petsc-users] MAT_COPY_VALUES not allowed for unassembled matrix Message-ID: Hello, I am using the MatDuplicate routine so that I use the Jacobian matrix as a preconditioning matrix as well. However, I get the error "MAT_COPY_VALUES not allowed for unassembled matrix". The exact command I use is the following: *call MatDuplicate(jac,MAT_COPY_VALUES,prec,ierr)* I am attaching you the error output in a text file for your reference. Could you please let me know how to solve this problem?. Thank you in advance. Best Regards, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: MAT_COPY_VALUES not allowed for unassembled matrix [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-6.agave.rc.asu.edu by pbaikadi Sat Oct 10 02:25:11 2020 [0]PETSC ERROR: Configure options [0]PETSC ERROR: #1 MatDuplicate() line 4606 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/interface/matrix.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Not for unassembled matrix [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-6.agave.rc.asu.edu by pbaikadi Sat Oct 10 02:25:11 2020 [0]PETSC ERROR: Configure options [0]PETSC ERROR: #2 MatGetOrdering() line 180 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/order/sorder.c [0]PETSC ERROR: #3 PCSetUp_ILU() line 134 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/impls/factor/ilu/ilu.c [0]PETSC ERROR: #4 PCSetUp() line 932 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 391 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve() line 725 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 225 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/impls/ls/ls.c [0]PETSC ERROR: #8 SNESSolve() line 4560 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/interface/snes.c From knepley at gmail.com Sat Oct 10 12:25:29 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 10 Oct 2020 13:25:29 -0400 Subject: [petsc-users] MAT_COPY_VALUES not allowed for unassembled matrix In-Reply-To: References: Message-ID: On Sat, Oct 10, 2020 at 5:31 AM baikadi pranay wrote: > Hello, > I am using the MatDuplicate routine so that I use the Jacobian matrix as a > preconditioning matrix as well. However, I get the error "MAT_COPY_VALUES > not allowed for unassembled matrix". The exact command I use is the > following: > *call MatDuplicate(jac,MAT_COPY_VALUES,prec,ierr)* > I am attaching you the error output in a text file for your reference. > Could you please let me know how to solve this problem?. > 1) You should be using CHKERRQ(ierr) after the call 2) You need to assemble the matrix, MatAssemblyBegin/End(), before calling MatDuplicate() 3) You do not need to duplicate the matrix, just pass the same matrix twice Thanks, Matt > Thank you in advance. > Best Regards, > Pranay. > > ? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sat Oct 10 14:45:16 2020 From: jed at jedbrown.org (Jed Brown) Date: Sat, 10 Oct 2020 13:45:16 -0600 Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian In-Reply-To: <0BA3784A-6ECD-4FEB-8E64-5F05A89559CF@petsc.dev> References: <0BA3784A-6ECD-4FEB-8E64-5F05A89559CF@petsc.dev> Message-ID: <87ft6l50hf.fsf@jedbrown.org> Barry Smith writes: > I'm sorry, I made a small mistake in my previous email. It is > > F'(x) = A(x) + A'(x)x - b'(x) not F'(x) = A(x) + A'(x)x + b'(x) I find this much easier to write in variational notation: F(x) = A(x) x - b(x) F'(x) dx = A(x) dx + (A'(x) dx) x - b'(x) dx Note that A'(x) is a third order tensor so A'(x) dx is a second order tensor (i.e., a matrix). As such, one never wants to represent A'(x) on its own, or even A'(x) dx for that matter. This is one reason I dislike this notation. For any given example, it's often possible to write the operator dx \mapsto (A'(x) dx) x in an intuitive way, but it can take thought and this tends to be more circuitous than working with F(x) directly. From y.juntao at hotmail.com Sun Oct 11 01:10:37 2020 From: y.juntao at hotmail.com (Karl Yang) Date: Sun, 11 Oct 2020 14:10:37 +0800 Subject: [petsc-users] Convergence Error Debugging with KSP solvers in SNES In-Reply-To: <9AA79324-0653-4088-8A1E-72FEE8CD1631@petsc.dev> References: <9AA79324-0653-4088-8A1E-72FEE8CD1631@petsc.dev> Message-ID: Hi, Barray, Thank you for helping. I've identified the divergence took place at fieldsplit_1_inner. It is singular for "pressure field" because I used periodic boundary condition. But doesn't ksp solver by default took care of constant null space? And if it is necessary, could you give me some help on what's the best location to call MatNullSpaceRemove as the ksp solver is hiding after SNES solver and FIELDSPLIT_PC. Regards Juntao On Oct 9 2020, at 2:25 pm, Barry Smith wrote: > > Before you do any investigation I would run with one SNES solve > > -snes_max_it 1 -snes_view > > It will print out exactly what solver configuration you are using. > > ------ > > [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html (https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/0?redirect=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fdocumentation%2Ffaq.html&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D) for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 > [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda > [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c > > Ok, one of the inner solvers went crazy and diverged, that is the norm of the residual for that inner solver exploded. > > Base on the line number > > [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c > > you can look at that file and see which of the inner solvers failed. From the info from -snes_view you will know what the KSP and PC is for that inner solve and the KSP options prefix. With that you can run the failing case with the addition option > > -xxx_ksp_monitor_true_residual > > and watch the inner solve explode. > > This inner solver behaving badly can also explain the need for -ksp_type fgmres. Normally PCFIELDSPLIT is a linear operator and so you can use -ksp_type gmres but there is some issue with the inner solver. Could it possible have a null space, that is be singular? Do you provide your own custom inner solver or just select from the options database? Does -pc_type lu make everything work fine? > > Barry > > > > > > > On Oct 9, 2020, at 12:53 AM, Karl Yang wrote: > > Hi, Barry, > > Thanks for your reply. Yes, I should have used fgmres. But after switching to fgmres I'm still facing the same convergence issue. > > Seems like the reason is due to DIVERGED_PC_FAILED. But I simply used FD jacobian, and fieldsplitPC. I am a bit lost on whether I made some mistakes somewhere in the FormFunction or I did not setup the solver correctly. > > ///////code/////// > > SNESSetFunction(snes, r, FormFunctionStatic, this); > > // SNESSetJacobian(snes, J, J, FormJacobianStatic, this); > > SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault, this); > > SNESMonitorSet(snes, MySNESMonitor, NULL, NULL); > > > > SNESGetKSP(snes, &ksp); > > KSPGetPC(ksp, &pc); > > PCSetType(pc, PCFIELDSPLIT); > > PCFieldSplitSetDetectSaddlePoint(pc, PETSC_TRUE); > > PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELF, NULL); > > KSPMonitorSet(ksp, MyKSPMonitor, NULL, 0); > > KSPSetTolerances(ksp, 1e-8, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); > > SNESSetFromOptions(snes); > > //////end///////// > > > > Output from SNES/KSP solver > > ################# step 1 ################# > > iter = 0, SNES Function norm 0.0430713 > > iteration 0 KSP Residual norm 4.307133784528e-02 > > 0 KSP unpreconditioned resid norm 4.307133784528e-02 true resid norm 4.307133784528e-02 ||r(i)||/||b|| 1.000000000000e+00 > > iteration 1 KSP Residual norm 4.451434065870e-07 > > 1 KSP unpreconditioned resid norm 4.451434065870e-07 true resid norm 4.451434065902e-07 ||r(i)||/||b|| 1.033502623460e-05 > > iteration 2 KSP Residual norm 1.079756105012e-12 > > 2 KSP unpreconditioned resid norm 1.079756105012e-12 true resid norm 1.079754870815e-12 ||r(i)||/||b|| 2.506898844643e-11 > > Linear solve converged due to CONVERGED_RTOL iterations 2 > > iter = 1, SNES Function norm 2.40846e-05 > > iteration 0 KSP Residual norm 2.408462930023e-05 > > 0 KSP unpreconditioned resid norm 2.408462930023e-05 true resid norm 2.408462930023e-05 ||r(i)||/||b|| 1.000000000000e+00 > > iteration 1 KSP Residual norm 1.096958085415e-11 > > 1 KSP unpreconditioned resid norm 1.096958085415e-11 true resid norm 1.096958085425e-11 ||r(i)||/||b|| 4.554598170270e-07 > > iteration 2 KSP Residual norm 5.909523288165e-16 > > 2 KSP unpreconditioned resid norm 5.909523288165e-16 true resid norm 5.909519599233e-16 ||r(i)||/||b|| 2.453647729249e-11 > > Linear solve converged due to CONVERGED_RTOL iterations 2 > > iter = 2, SNES Function norm 1.19684e-14 > > ################# step 2 ################# > > iter = 0, SNES Function norm 0.00391662 > > iteration 0 KSP Residual norm 3.916615614134e-03 > > 0 KSP unpreconditioned resid norm 3.916615614134e-03 true resid norm 3.916615614134e-03 ||r(i)||/||b|| 1.000000000000e+00 > > iteration 1 KSP Residual norm 4.068800385009e-08 > > 1 KSP unpreconditioned resid norm 4.068800385009e-08 true resid norm 4.068800384986e-08 ||r(i)||/||b|| 1.038856192653e-05 > > iteration 2 KSP Residual norm 8.427513055511e-14 > > 2 KSP unpreconditioned resid norm 8.427513055511e-14 true resid norm 8.427497502034e-14 ||r(i)||/||b|| 2.151729537007e-11 > > Linear solve converged due to CONVERGED_RTOL iterations 2 > > iter = 1, SNES Function norm 1.99152e-07 > > iteration 0 KSP Residual norm 1.991523558528e-07 > > 0 KSP unpreconditioned resid norm 1.991523558528e-07 true resid norm 1.991523558528e-07 ||r(i)||/||b|| 1.000000000000e+00 > > iteration 1 KSP Residual norm 1.413505562549e-13 > > 1 KSP unpreconditioned resid norm 1.413505562549e-13 true resid norm 1.413505562550e-13 ||r(i)||/||b|| 7.097609046588e-07 > > iteration 2 KSP Residual norm 5.165934822520e-18 > > 2 KSP unpreconditioned resid norm 5.165934822520e-18 true resid norm 5.165932973227e-18 ||r(i)||/||b|| 2.593960262787e-11 > > Linear solve converged due to CONVERGED_RTOL iterations 2 > > iter = 2, SNES Function norm 1.69561e-16 > > ################# step 3 ################# > > iter = 0, SNES Function norm 0.00035615 > > iteration 0 KSP Residual norm 3.561504844171e-04 > > 0 KSP unpreconditioned resid norm 3.561504844171e-04 true resid norm 3.561504844171e-04 ||r(i)||/||b|| 1.000000000000e+00 > > iteration 1 KSP Residual norm 3.701591890269e-09 > > 1 KSP unpreconditioned resid norm 3.701591890269e-09 true resid norm 3.701591890274e-09 ||r(i)||/||b|| 1.039333667153e-05 > > iteration 2 KSP Residual norm 7.832821034843e-15 > > 2 KSP unpreconditioned resid norm 7.832821034843e-15 true resid norm 7.832856926692e-15 ||r(i)||/||b|| 2.199311041093e-11 > > Linear solve converged due to CONVERGED_RTOL iterations 2 > > iter = 1, SNES Function norm 1.64671e-09 > > iteration 0 KSP Residual norm 1.646709543241e-09 > > 0 KSP unpreconditioned resid norm 1.646709543241e-09 true resid norm 1.646709543241e-09 ||r(i)||/||b|| 1.000000000000e+00 > > iteration 1 KSP Residual norm 1.043230469512e-15 > > 1 KSP unpreconditioned resid norm 1.043230469512e-15 true resid norm 1.043230469512e-15 ||r(i)||/||b|| 6.335242749968e-07 > > iteration 1 KSP Residual norm 0.000000000000e+00 > > 1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan > > Linear solve did not converge due to DIVERGED_PC_FAILED iterations 1 > > PC_FAILED due to SUBPC_ERROR > > > > > > More information from -ksp_error_if_not_converged -info > > [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.303168180659e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 7.795816360977e-02 at iteration 12 > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.227610512466e+00 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 5.453050347652e+05 at iteration 12 > > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > > [0] KSPConvergedDefault(): Linear solver is diverging. Initial right hand size norm 9.501675075823e-01, current residual norm 4.894880836662e+04 at iteration 210 > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [0]PETSC ERROR: > > [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html (https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/2?redirect=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fdocumentation%2Ffaq.html&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D) for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 > > [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020 > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda > > [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c > > [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c > > [0]PETSC ERROR: #3 PCApply() line 444 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/interface/precon.c > > [0]PETSC ERROR: #4 KSP_PCApply() line 281 in /usr/local/petsc/petsc-3.12.5/include/petsc/private/kspimpl.h > > [0]PETSC ERROR: #5 KSPFGMRESCycle() line 166 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > > [0]PETSC ERROR: #6 KSPSolve_FGMRES() line 291 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > > [0]PETSC ERROR: #7 KSPSolve() line 760 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c > > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /usr/local/petsc/petsc-3.12.5/src/snes/impls/ls/ls.c > > [0]PETSC ERROR: #9 SNESSolve() line 4482 in /usr/local/petsc/petsc-3.12.5/src/snes/interface/snes.c > > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783 > > SNES Object: 1 MPI processes > > type: newtonls > > maximum iterations=50, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > > total number of linear solver iterations=2 > > total number of function evaluations=1322 > > norm schedule ALWAYS > > Jacobian is built using finite differences one column at a time > > SNESLineSearch Object: 1 MPI processes > > type: bt > > interpolation: cubic > > alpha=1.000000e-04 > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 1 MPI processes > > type: fgmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: fieldsplit > > FieldSplit with Schur preconditioner, blocksize = 1, factorization FULL > > Preconditioner for the Schur complement formed from S itself > > Split info: > > Split number 0 Defined by IS > > Split number 1 Defined by IS > > KSP solver for A00 block > > KSP Object: (fieldsplit_0_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (fieldsplit_0_) 1 MPI processes > > type: ilu > > out-of-place factorization > > 0 levels of fill > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 1., needed 1. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=512, cols=512 > > package used to perform factorization: petsc > > total: nonzeros=9213, allocated nonzeros=9213 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=512, cols=512 > > total: nonzeros=9213, allocated nonzeros=9213 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > KSP solver for S = A11 - A10 inv(A00) A01 > > KSP Object: (fieldsplit_1_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (fieldsplit_1_) 1 MPI processes > > type: none > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_1_) 1 MPI processes > > type: schurcomplement > > rows=147, cols=147 > > Schur complement A11 - A10 inv(A00) A01 > > A11 > > Mat Object: 1 MPI processes > > type: seqaij > > rows=147, cols=147 > > total: nonzeros=147, allocated nonzeros=147 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > A10 > > Mat Object: 1 MPI processes > > type: seqaij > > rows=147, cols=512 > > total: nonzeros=2560, allocated nonzeros=2560 > > total number of mallocs used during MatSetValues calls=0 > > using I-node routines: found 87 nodes, limit used is 5 > > KSP of A00 > > KSP Object: (fieldsplit_0_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (fieldsplit_0_) 1 MPI processes > > type: ilu > > out-of-place factorization > > 0 levels of fill > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 1., needed 1. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=512, cols=512 > > package used to perform factorization: petsc > > total: nonzeros=9213, allocated nonzeros=9213 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=512, cols=512 > > total: nonzeros=9213, allocated nonzeros=9213 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > A01 > > Mat Object: 1 MPI processes > > type: seqaij > > rows=512, cols=147 > > total: nonzeros=2562, allocated nonzeros=2562 > > total number of mallocs used during MatSetValues calls=0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=659, cols=659 > > total: nonzeros=14482, allocated nonzeros=27543 > > total number of mallocs used during MatSetValues calls=1309 > > not using I-node routines > > > > > > > > On Oct 9 2020, at 2:17 am, Barry Smith wrote: > > > > > > When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner. > > > > > > How are you doing the matrix vector product? Note both the operator and preconditioner must be linear operators for GMRES. > > > > > > FGMRES allows the preconditioner to be nonlinear. You can try > > > > > > -ksp_type fgmres -ksp_monitor_true_residual > > > > > > Barry > > > > > > > > > > On Oct 8, 2020, at 2:43 AM, Yang Juntao wrote: > > > > Hello, > > > > > > > > I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver. > > > > But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error. > > > > > > > > From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration. > > > > Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem. > > > > > > > > iter = 0, SNES Function norm 2.94934e-06 > > > > iteration 0 KSP Residual norm 1.094600281831e-06 > > > > iteration 1 KSP Residual norm 1.264284474186e-08 > > > > iteration 2 KSP Residual norm 6.593269221816e-09 > > > > iteration 3 KSP Residual norm 1.689570779457e-09 > > > > iteration 4 KSP Residual norm 1.040661505932e-09 > > > > iteration 5 KSP Residual norm 5.422761817348e-10 > > > > iteration 6 KSP Residual norm 2.492867371369e-10 > > > > iteration 7 KSP Residual norm 8.261522376775e-11 > > > > iteration 8 KSP Residual norm 4.246401544245e-11 > > > > iteration 9 KSP Residual norm 2.514366787388e-11 > > > > iteration 10 KSP Residual norm 1.982940267051e-11 > > > > iteration 11 KSP Residual norm 1.586470414676e-11 > > > > iteration 12 KSP Residual norm 9.866392216207e-12 > > > > iteration 13 KSP Residual norm 4.951342176999e-12 > > > > iteration 14 KSP Residual norm 2.418292660318e-12 > > > > iteration 15 KSP Residual norm 1.747418526086e-12 > > > > iteration 16 KSP Residual norm 1.094150535809e-12 > > > > iteration 17 KSP Residual norm 4.464287492066e-13 > > > > iteration 18 KSP Residual norm 3.530090494462e-13 > > > > iteration 19 KSP Residual norm 2.825698091454e-13 > > > > iteration 20 KSP Residual norm 1.950568425807e-13 > > > > iteration 21 KSP Residual norm 1.227898091813e-13 > > > > iteration 22 KSP Residual norm 5.411106347374e-14 > > > > iteration 23 KSP Residual norm 4.511115848564e-14 > > > > iteration 24 KSP Residual norm 4.063546606691e-14 > > > > iteration 25 KSP Residual norm 3.677694771949e-14 > > > > iteration 26 KSP Residual norm 3.459244943466e-14 > > > > iteration 27 KSP Residual norm 3.263954971093e-14 > > > > iteration 28 KSP Residual norm 3.087344619079e-14 > > > > iteration 29 KSP Residual norm 2.809426925625e-14 > > > > iteration 30 KSP Residual norm 4.366149884754e-01 > > > > Linear solve did not converge due to DIVERGED_DTOL iterations 30 > > > > > > > > > > > > > > > > SNES Object: 1 MPI processes > > > > type: newtonls > > > > SNES has not been set up so information may be incomplete > > > > maximum iterations=50, maximum function evaluations=10000 > > > > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > > > > total number of linear solver iterations=0 > > > > total number of function evaluations=0 > > > > norm schedule ALWAYS > > > > SNESLineSearch Object: 1 MPI processes > > > > type: bt > > > > interpolation: cubic > > > > alpha=1.000000e-04 > > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > > > maximum iterations=40 > > > > KSP Object: 1 MPI processes > > > > type: gmres > > > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > > > happy breakdown tolerance 1e-30 > > > > maximum iterations=10000, initial guess is zero > > > > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > > > > left preconditioning > > > > using DEFAULT norm type for convergence test > > > > PC Object: 1 MPI processes > > > > type: fieldsplit > > > > PC has not been set up so information may be incomplete > > > > FieldSplit with Schur preconditioner, factorization FULL > > > > Preconditioner for the Schur complement formed from S itself > > > > Split info: > > > > KSP solver for A00 block > > > > not yet available > > > > KSP solver for S = A11 - A10 inv(A00) A01 > > > > not yet available > > > > linear system matrix = precond matrix: > > > > Mat Object: 1 MPI processes > > > > type: seqaij > > > > rows=659, cols=659 > > > > total: nonzeros=659, allocated nonzeros=7908 > > > > total number of mallocs used during MatSetValues calls=0 > > > > not using I-node routines > > > > > > > > Regards > > > > Juntao > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Oct 11 01:53:50 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 11 Oct 2020 01:53:50 -0500 Subject: [petsc-users] Convergence Error Debugging with KSP solvers in SNES In-Reply-To: References: <9AA79324-0653-4088-8A1E-72FEE8CD1631@petsc.dev> Message-ID: <76AC38BE-D6D9-415D-8A02-68466455F874@petsc.dev> Juntao, Are you providing the null space to the pressure field solver? Since it is only the constants it is possible to provide the null space from the command line using -prefix_ksp_constant_null_space where prefix is the prefix from the particular sub KSP you wish to solve. This is printed next to each inner KSP in the output from -ksp_view. In your case I am guessing you need -fieldsplit_1_inner_ksp_constant_null_space If you ran before with -fieldsplit_1_inner_ksp_monitor you can still use this option and hopefully will not see the pressure solve explode by adding the constant command line option. It is also possible to attach the null space within the program but that is cumbersome so best to get everything working and worry about that superficial change later. Barry BTW: I have a new git branch barry/2020-10-09/all-ksp-monitor that I wrote to help make it easier to understand the convergence of PCFIELDSPLIT. You just use the option -all_ksp_monitor and it will print the convergence history for ALL the KSP solves, inner and outer. So one can easily see which inner ones are working well and which are not. > On Oct 11, 2020, at 1:10 AM, Karl Yang wrote: > > Hi, Barray, > > Thank you for helping. I've identified the divergence took place at fieldsplit_1_inner. > It is singular for "pressure field" because I used periodic boundary condition. But doesn't ksp solver by default took care of constant null space? > > And if it is necessary, could you give me some help on what's the best location to call MatNullSpaceRemove as the ksp solver is hiding after SNES solver and FIELDSPLIT_PC. > > Regards > Juntao > > > On Oct 9 2020, at 2:25 pm, Barry Smith wrote: > > Before you do any investigation I would run with one SNES solve > > -snes_max_it 1 -snes_view > > It will print out exactly what solver configuration you are using. > > ------ > > [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 > [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda > [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c > > Ok, one of the inner solvers went crazy and diverged, that is the norm of the residual for that inner solver exploded. > > Base on the line number > > [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c > > you can look at that file and see which of the inner solvers failed. From the info from -snes_view you will know what the KSP and PC is for that inner solve and the KSP options prefix. With that you can run the failing case with the addition option > > -xxx_ksp_monitor_true_residual > > and watch the inner solve explode. > > This inner solver behaving badly can also explain the need for -ksp_type fgmres. Normally PCFIELDSPLIT is a linear operator and so you can use -ksp_type gmres but there is some issue with the inner solver. Could it possible have a null space, that is be singular? Do you provide your own custom inner solver or just select from the options database? Does -pc_type lu make everything work fine? > > Barry > > > > > > > On Oct 9, 2020, at 12:53 AM, Karl Yang > wrote: > > Hi, Barry, > > Thanks for your reply. Yes, I should have used fgmres. But after switching to fgmres I'm still facing the same convergence issue. > > Seems like the reason is due to DIVERGED_PC_FAILED. But I simply used FD jacobian, and fieldsplitPC. I am a bit lost on whether I made some mistakes somewhere in the FormFunction or I did not setup the solver correctly. > > ///////code/////// > SNESSetFunction(snes, r, FormFunctionStatic, this); > // SNESSetJacobian(snes, J, J, FormJacobianStatic, this); > SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault, this); > SNESMonitorSet(snes, MySNESMonitor, NULL, NULL); > > SNESGetKSP(snes, &ksp); > KSPGetPC(ksp, &pc); > PCSetType(pc, PCFIELDSPLIT); > PCFieldSplitSetDetectSaddlePoint(pc, PETSC_TRUE); > PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELF, NULL); > KSPMonitorSet(ksp, MyKSPMonitor, NULL, 0); > KSPSetTolerances(ksp, 1e-8, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); > SNESSetFromOptions(snes); > //////end///////// > > Output from SNES/KSP solver > ################# step 1 ################# > iter = 0, SNES Function norm 0.0430713 > iteration 0 KSP Residual norm 4.307133784528e-02 > 0 KSP unpreconditioned resid norm 4.307133784528e-02 true resid norm 4.307133784528e-02 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 4.451434065870e-07 > 1 KSP unpreconditioned resid norm 4.451434065870e-07 true resid norm 4.451434065902e-07 ||r(i)||/||b|| 1.033502623460e-05 > iteration 2 KSP Residual norm 1.079756105012e-12 > 2 KSP unpreconditioned resid norm 1.079756105012e-12 true resid norm 1.079754870815e-12 ||r(i)||/||b|| 2.506898844643e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 1, SNES Function norm 2.40846e-05 > iteration 0 KSP Residual norm 2.408462930023e-05 > 0 KSP unpreconditioned resid norm 2.408462930023e-05 true resid norm 2.408462930023e-05 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 1.096958085415e-11 > 1 KSP unpreconditioned resid norm 1.096958085415e-11 true resid norm 1.096958085425e-11 ||r(i)||/||b|| 4.554598170270e-07 > iteration 2 KSP Residual norm 5.909523288165e-16 > 2 KSP unpreconditioned resid norm 5.909523288165e-16 true resid norm 5.909519599233e-16 ||r(i)||/||b|| 2.453647729249e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 2, SNES Function norm 1.19684e-14 > ################# step 2 ################# > iter = 0, SNES Function norm 0.00391662 > iteration 0 KSP Residual norm 3.916615614134e-03 > 0 KSP unpreconditioned resid norm 3.916615614134e-03 true resid norm 3.916615614134e-03 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 4.068800385009e-08 > 1 KSP unpreconditioned resid norm 4.068800385009e-08 true resid norm 4.068800384986e-08 ||r(i)||/||b|| 1.038856192653e-05 > iteration 2 KSP Residual norm 8.427513055511e-14 > 2 KSP unpreconditioned resid norm 8.427513055511e-14 true resid norm 8.427497502034e-14 ||r(i)||/||b|| 2.151729537007e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 1, SNES Function norm 1.99152e-07 > iteration 0 KSP Residual norm 1.991523558528e-07 > 0 KSP unpreconditioned resid norm 1.991523558528e-07 true resid norm 1.991523558528e-07 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 1.413505562549e-13 > 1 KSP unpreconditioned resid norm 1.413505562549e-13 true resid norm 1.413505562550e-13 ||r(i)||/||b|| 7.097609046588e-07 > iteration 2 KSP Residual norm 5.165934822520e-18 > 2 KSP unpreconditioned resid norm 5.165934822520e-18 true resid norm 5.165932973227e-18 ||r(i)||/||b|| 2.593960262787e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 2, SNES Function norm 1.69561e-16 > ################# step 3 ################# > iter = 0, SNES Function norm 0.00035615 > iteration 0 KSP Residual norm 3.561504844171e-04 > 0 KSP unpreconditioned resid norm 3.561504844171e-04 true resid norm 3.561504844171e-04 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 3.701591890269e-09 > 1 KSP unpreconditioned resid norm 3.701591890269e-09 true resid norm 3.701591890274e-09 ||r(i)||/||b|| 1.039333667153e-05 > iteration 2 KSP Residual norm 7.832821034843e-15 > 2 KSP unpreconditioned resid norm 7.832821034843e-15 true resid norm 7.832856926692e-15 ||r(i)||/||b|| 2.199311041093e-11 > Linear solve converged due to CONVERGED_RTOL iterations 2 > iter = 1, SNES Function norm 1.64671e-09 > iteration 0 KSP Residual norm 1.646709543241e-09 > 0 KSP unpreconditioned resid norm 1.646709543241e-09 true resid norm 1.646709543241e-09 ||r(i)||/||b|| 1.000000000000e+00 > iteration 1 KSP Residual norm 1.043230469512e-15 > 1 KSP unpreconditioned resid norm 1.043230469512e-15 true resid norm 1.043230469512e-15 ||r(i)||/||b|| 6.335242749968e-07 > iteration 1 KSP Residual norm 0.000000000000e+00 > 1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan > Linear solve did not converge due to DIVERGED_PC_FAILED iterations 1 > PC_FAILED due to SUBPC_ERROR > > > More information from -ksp_error_if_not_converged -info > > [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.303168180659e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 7.795816360977e-02 at iteration 12 > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.227610512466e+00 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 5.453050347652e+05 at iteration 12 > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged > [0] KSPConvergedDefault(): Linear solver is diverging. Initial right hand size norm 9.501675075823e-01, current residual norm 4.894880836662e+04 at iteration 210 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 > [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda > [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c > [0]PETSC ERROR: #3 PCApply() line 444 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #4 KSP_PCApply() line 281 in /usr/local/petsc/petsc-3.12.5/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #5 KSPFGMRESCycle() line 166 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [0]PETSC ERROR: #6 KSPSolve_FGMRES() line 291 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [0]PETSC ERROR: #7 KSPSolve() line 760 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /usr/local/petsc/petsc-3.12.5/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #9 SNESSolve() line 4482 in /usr/local/petsc/petsc-3.12.5/src/snes/interface/snes.c > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783 > SNES Object: 1 MPI processes > type: newtonls > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=2 > total number of function evaluations=1322 > norm schedule ALWAYS > Jacobian is built using finite differences one column at a time > SNESLineSearch Object: 1 MPI processes > type: bt > interpolation: cubic > alpha=1.000000e-04 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=40 > KSP Object: 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > FieldSplit with Schur preconditioner, blocksize = 1, factorization FULL > Preconditioner for the Schur complement formed from S itself > Split info: > Split number 0 Defined by IS > Split number 1 Defined by IS > KSP solver for A00 block > KSP Object: (fieldsplit_0_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > package used to perform factorization: petsc > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > KSP solver for S = A11 - A10 inv(A00) A01 > KSP Object: (fieldsplit_1_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_1_) 1 MPI processes > type: none > linear system matrix = precond matrix: > Mat Object: (fieldsplit_1_) 1 MPI processes > type: schurcomplement > rows=147, cols=147 > Schur complement A11 - A10 inv(A00) A01 > A11 > Mat Object: 1 MPI processes > type: seqaij > rows=147, cols=147 > total: nonzeros=147, allocated nonzeros=147 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > A10 > Mat Object: 1 MPI processes > type: seqaij > rows=147, cols=512 > total: nonzeros=2560, allocated nonzeros=2560 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 87 nodes, limit used is 5 > KSP of A00 > KSP Object: (fieldsplit_0_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > package used to perform factorization: petsc > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=512 > total: nonzeros=9213, allocated nonzeros=9213 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > A01 > Mat Object: 1 MPI processes > type: seqaij > rows=512, cols=147 > total: nonzeros=2562, allocated nonzeros=2562 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=659, cols=659 > total: nonzeros=14482, allocated nonzeros=27543 > total number of mallocs used during MatSetValues calls=1309 > not using I-node routines > > > > On Oct 9 2020, at 2:17 am, Barry Smith > wrote: > > When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner. > > How are you doing the matrix vector product? Note both the operator and preconditioner must be linear operators for GMRES. > > FGMRES allows the preconditioner to be nonlinear. You can try > > -ksp_type fgmres -ksp_monitor_true_residual > > Barry > > > On Oct 8, 2020, at 2:43 AM, Yang Juntao > wrote: > > Hello, > > I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver. > But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error. > > From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration. > Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem. > > iter = 0, SNES Function norm 2.94934e-06 > iteration 0 KSP Residual norm 1.094600281831e-06 > iteration 1 KSP Residual norm 1.264284474186e-08 > iteration 2 KSP Residual norm 6.593269221816e-09 > iteration 3 KSP Residual norm 1.689570779457e-09 > iteration 4 KSP Residual norm 1.040661505932e-09 > iteration 5 KSP Residual norm 5.422761817348e-10 > iteration 6 KSP Residual norm 2.492867371369e-10 > iteration 7 KSP Residual norm 8.261522376775e-11 > iteration 8 KSP Residual norm 4.246401544245e-11 > iteration 9 KSP Residual norm 2.514366787388e-11 > iteration 10 KSP Residual norm 1.982940267051e-11 > iteration 11 KSP Residual norm 1.586470414676e-11 > iteration 12 KSP Residual norm 9.866392216207e-12 > iteration 13 KSP Residual norm 4.951342176999e-12 > iteration 14 KSP Residual norm 2.418292660318e-12 > iteration 15 KSP Residual norm 1.747418526086e-12 > iteration 16 KSP Residual norm 1.094150535809e-12 > iteration 17 KSP Residual norm 4.464287492066e-13 > iteration 18 KSP Residual norm 3.530090494462e-13 > iteration 19 KSP Residual norm 2.825698091454e-13 > iteration 20 KSP Residual norm 1.950568425807e-13 > iteration 21 KSP Residual norm 1.227898091813e-13 > iteration 22 KSP Residual norm 5.411106347374e-14 > iteration 23 KSP Residual norm 4.511115848564e-14 > iteration 24 KSP Residual norm 4.063546606691e-14 > iteration 25 KSP Residual norm 3.677694771949e-14 > iteration 26 KSP Residual norm 3.459244943466e-14 > iteration 27 KSP Residual norm 3.263954971093e-14 > iteration 28 KSP Residual norm 3.087344619079e-14 > iteration 29 KSP Residual norm 2.809426925625e-14 > iteration 30 KSP Residual norm 4.366149884754e-01 > Linear solve did not converge due to DIVERGED_DTOL iterations 30 > > > SNES Object: 1 MPI processes > type: newtonls > SNES has not been set up so information may be incomplete > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=0 > total number of function evaluations=0 > norm schedule ALWAYS > SNESLineSearch Object: 1 MPI processes > type: bt > interpolation: cubic > alpha=1.000000e-04 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=40 > KSP Object: 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > PC has not been set up so information may be incomplete > FieldSplit with Schur preconditioner, factorization FULL > Preconditioner for the Schur complement formed from S itself > Split info: > KSP solver for A00 block > not yet available > KSP solver for S = A11 - A10 inv(A00) A01 > not yet available > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=659, cols=659 > total: nonzeros=659, allocated nonzeros=7908 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > > Regards > Juntao > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.jamond at cea.fr Mon Oct 12 06:10:02 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Mon, 12 Oct 2020 13:10:02 +0200 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: <218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev> References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> <87tuv48osv.fsf@jedbrown.org> <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev> <218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev> Message-ID: Hi Barry, Thanks for this work! I tried this branch with my code and sequential matrices on a small case: it does work! Thanks a lot, Olivier On 09/10/2020 03:50, Barry Smith wrote: > > ? Olivier, > > ? ? The branch *barry/2020-10-08/invert-block-diagonal-aij*?contains > an example src/mat/tests/ex178.c that shows how to compute inv(CC'). > It works for SeqAIJ matrices. > > ? ? Please let us know if it works for you and then I will implement > the parallel version. > > ? Barry > > >> On Oct 8, 2020, at 3:59 PM, Barry Smith > > wrote: >> >> >> ?Olivier >> >> ?I am working on extending the routines now and hopefully push a >> branch you can try fairly soon. >> >> ?Barry >> >> >>> On Oct 8, 2020, at 3:07 PM, Jed Brown >> > wrote: >>> >>> Olivier Jamond >> > writes: >>> >>>>> ??Given the structure of C it seems you should just explicitly >>>>> construct Sp and use GAMG (or other preconditioners, even a direct >>>>> solver) directly on Sp. Trying to avoid explicitly forming Sp will >>>>> give you a much slower performing solving for what benefit? If C >>>>> was just some generic monster than forming Sp might be unrealistic >>>>> but in your case CCt is is block diagonal with tiny blocks which >>>>> means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks >>>>> are the inverses of the blocks of (C*Ct)). >>>>> >>>>> ???Sp = Ct*C ?+ Qt * S * Q = Ct*C ?+ ?[I - Ct * (C*Ct)^(-1)*C] S >>>>> [I - Ct * (C*Ct)^(-1)*C] >>>>> >>>>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly >>>>> larger blocks. >>>>> >>>>> You can do D = (C*Ct) with MatMatMult() then write custom code >>>>> that zips through the diagonal blocks of D inverting all of them >>>>> to get iD then use MatPtAP applied to C and iD to get Ct * >>>>> (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or >>>>> MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] >>>>> ?then finally MatAXPY() to get Sp. The complexity of each of the >>>>> Mat operations is very low because of the absurdly simple >>>>> structure of C and its descendants. ??You might even be able to >>>>> just use MUMPS to give you the explicit inv(C*Ct) without writing >>>>> custom code to get iD. >>>> >>>> At this time, I didn't manage to compute iD=inv(C*Ct) without using >>>> dense matrices, what may be a shame because all matrices are sparse >>>> . Is >>>> it possible? >>>> >>>> And I get no idea of how to write code to manually zip through the >>>> diagonal blocks of D to invert them... >>> >>> You could use MatInvertVariableBlockDiagonal(), which should perhaps >>> return a Mat instead of a raw array. >>> >>> If you have constant block sizes, MatInvertBlockDiagonalMat will >>> return a Mat. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Oct 12 09:49:12 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 12 Oct 2020 09:49:12 -0500 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> <87tuv48osv.fsf@jedbrown.org> <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev> <218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev> Message-ID: > On Oct 12, 2020, at 6:10 AM, Olivier Jamond wrote: > > Hi Barry, > > Thanks for this work! I tried this branch with my code and sequential matrices on a small case: it does work! > > Excellant. I will extend it to the parallel case and get it into our master release. We'd be interested in hearing about your convergence and timing experiences when you run largish jobs (even sequentially) since this type of problem comes up relatively frequently and we do need a variety of solvers that can handle it while currently we do not have great tools for it. Barry > Thanks a lot, > Olivier > > On 09/10/2020 03:50, Barry Smith wrote: >> >> Olivier, >> >> The branch barry/2020-10-08/invert-block-diagonal-aij contains an example src/mat/tests/ex178.c that shows how to compute inv(CC'). It works for SeqAIJ matrices. >> >> Please let us know if it works for you and then I will implement the parallel version. >> >> Barry >> >> >>> On Oct 8, 2020, at 3:59 PM, Barry Smith > wrote: >>> >>> >>> Olivier >>> >>> I am working on extending the routines now and hopefully push a branch you can try fairly soon. >>> >>> Barry >>> >>> >>>> On Oct 8, 2020, at 3:07 PM, Jed Brown > wrote: >>>> >>>> Olivier Jamond > writes: >>>> >>>>>> Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)). >>>>>> >>>>>> Sp = Ct*C + Qt * S * Q = Ct*C + [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] >>>>>> >>>>>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks. >>>>>> >>>>>> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants. You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD. >>>>> >>>>> At this time, I didn't manage to compute iD=inv(C*Ct) without using >>>>> dense matrices, what may be a shame because all matrices are sparse . Is >>>>> it possible? >>>>> >>>>> And I get no idea of how to write code to manually zip through the >>>>> diagonal blocks of D to invert them... >>>> >>>> You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array. >>>> >>>> If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat. >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.appel17 at imperial.ac.uk Tue Oct 13 07:47:57 2020 From: t.appel17 at imperial.ac.uk (Thibaut Appel) Date: Tue, 13 Oct 2020 14:47:57 +0200 Subject: [petsc-users] About MAT_NEW_NONZERO_LOCATION[] Message-ID: <0dd90b52-ca23-8197-3ac4-84844054fdda@imperial.ac.uk> Hi there, just a quick question: It seems MAT_NEW_NONZERO_LOCATION_ERR set to PETSC_TRUE has kind of the same purpose as MAT_NEW_NONZERO_LOCATIONS set to PETSC_FALSE, the difference being if an additional entry is there, the former produces an error whereas in the latter it is simply ignored. However the manual states: 'If one wishes to repeatedly assemble matrices that retain the same nonzero pattern (such as within a nonlinear or time-dependent problem), the option MatSetOption(MatA,*MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE*); should be specified after the first matrix has been fully assembled. This option ensures that certain data structures and communication information will be reused (instead of regenerated) during successive steps, thereby increasing efficiency' If I only declare: ??? CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr) Would the data structures still be reused in later matrix assemblies? Or does it rather make sense to use conjointly: ??? CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr) ??? CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ierr) Thank you, Thibaut -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 13 09:41:37 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 13 Oct 2020 09:41:37 -0500 Subject: [petsc-users] About MAT_NEW_NONZERO_LOCATION[] In-Reply-To: <0dd90b52-ca23-8197-3ac4-84844054fdda@imperial.ac.uk> References: <0dd90b52-ca23-8197-3ac4-84844054fdda@imperial.ac.uk> Message-ID: <57AB9DEE-AF44-43DD-8F76-A9205E9D418A@petsc.dev> You only need to provide one of the options. The docs are slightly misleading.The flags only tells the matrix what to do with new nonzero locations, preventing new ones. The Mat actually tracks if new non-zeros locations are actually entered independent of the flags. So, for example even if you did not supply any new flags AND your code did not insert new locations then the structure would be reused. Barry > On Oct 13, 2020, at 7:47 AM, Thibaut Appel wrote: > > Hi there, just a quick question: > > It seems MAT_NEW_NONZERO_LOCATION_ERR set to PETSC_TRUE has kind of the same purpose as MAT_NEW_NONZERO_LOCATIONS set to PETSC_FALSE, the difference being if an additional entry is there, the former produces an error whereas in the latter it is simply ignored. > > However the manual states: > > 'If one wishes to repeatedly assemble matrices that retain the same nonzero pattern (such as within a nonlinear or time-dependent problem), the option MatSetOption(MatA,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); should be specified after the first matrix has been fully assembled. This option ensures that certain data structures and communication information will be reused (instead of regenerated) during successive steps, thereby increasing efficiency' > > If I only declare: > > CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr) > > Would the data structures still be reused in later matrix assemblies? > > Or does it rather make sense to use conjointly: > > CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr) > CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ierr) > > Thank you, > > > > Thibaut > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.barral at math.u-bordeaux.fr Wed Oct 14 05:00:34 2020 From: nicolas.barral at math.u-bordeaux.fr (Nicolas Barral) Date: Wed, 14 Oct 2020 12:00:34 +0200 Subject: [petsc-users] Python version needed for internal scripts Message-ID: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> Hi all, Apologies if the question has already been asked, but the ML archive search seems to be broken (or has it never worked ?). Many petsc scripts require a 'python' executable, which python should that be ? For now, python3 seems to have worked with the configure scripts and petsc_gen_xdmf scripts, but can I safely assume it will always be the case ? 'python' is usually an alias for python2, so making it point at python3 seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and maybe others, and if I have no python2 installed, and no 'python' alias, I have to manually edit all the scripts. Thanks -- Nicolas From knepley at gmail.com Wed Oct 14 05:20:13 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 14 Oct 2020 06:20:13 -0400 Subject: [petsc-users] Python version needed for internal scripts In-Reply-To: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> Message-ID: On Wed, Oct 14, 2020 at 6:01 AM Nicolas Barral < nicolas.barral at math.u-bordeaux.fr> wrote: > Hi all, > > Apologies if the question has already been asked, but the ML archive > search seems to be broken (or has it never worked ?). > > Many petsc scripts require a 'python' executable, which python should > that be ? For now, python3 seems to have worked with the configure > scripts and petsc_gen_xdmf scripts, but can I safely assume it will > always be the case ? > > 'python' is usually an alias for python2, so making it point at python3 > seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and > maybe others, and if I have no python2 installed, and no 'python' alias, > I have to manually edit all the scripts. > Right now, PETSc works with both Python2 and Python3. I am not sure how long we can support Python2, but the aim is to support it until End of Life, probably on Red Hat since they change the slowest I think. Thanks, Matt > Thanks > > -- > Nicolas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Wed Oct 14 05:26:26 2020 From: pierre at joliv.et (Pierre Jolivet) Date: Wed, 14 Oct 2020 12:26:26 +0200 Subject: [petsc-users] Python version needed for internal scripts In-Reply-To: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> Message-ID: Hello Nicolas, > On 14 Oct 2020, at 12:00 PM, Nicolas Barral wrote: > > Hi all, > > Apologies if the question has already been asked, but the ML archive search seems to be broken (or has it never worked ?). What do you mean broken? Can?t you access this URL https://lists.mcs.anl.gov/pipermail/petsc-dev/ ? I know that some French ISP are getting blocked by the ANL firewall, but I don?t think I've ever had this issue with the ML. > Many petsc scripts require a 'python' executable, which python should that be ? For now, python3 seems to have worked with the configure scripts and petsc_gen_xdmf scripts, but can I safely assume it will always be the case ? This was discussed in great length in the ML when 3.13.5 (or was it 3.13.4?) was released (because it was kind of broken at first), see https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2020-April/025879.html + https://gitlab.com/petsc/petsc/-/merge_requests/2818 This was then patched some more https://gitlab.com/petsc/petsc/-/merge_requests/2831 because some Python versions are not shipping distutils.sysconfig which is mandatory for the configure to go through. I think you can assume that a correct Python version will be picked up for you by configure. If this is not the case, we will need to fix this (by adding another check?). Thanks, Pierre > 'python' is usually an alias for python2, so making it point at python3 seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and maybe others, and if I have no python2 installed, and no 'python' alias, I have to manually edit all the scripts. > > Thanks > > -- > Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Wed Oct 14 06:12:02 2020 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 14 Oct 2020 14:12:02 +0300 Subject: [petsc-users] Python version needed for internal scripts In-Reply-To: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> Message-ID: On Wed, 14 Oct 2020 at 13:01, Nicolas Barral < nicolas.barral at math.u-bordeaux.fr> wrote: > > 'python' is usually an alias for python2, so making it point at python3 > seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and > maybe others, and if I have no python2 installed, and no 'python' alias, > I have to manually edit all the scripts. > > apt install python-is-python3 and you should get the alias python -> python3 in /usr/bin Disclaimer: I'm not an Ubuntu user. Google and inform yourself. -- Lisandro Dalcin ============ Senior Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Wed Oct 14 06:58:42 2020 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 14 Oct 2020 14:58:42 +0300 Subject: [petsc-users] Python version needed for internal scripts In-Reply-To: References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> Message-ID: On Wed, 14 Oct 2020 at 14:12, Lisandro Dalcin wrote: > Disclaimer: I'm not an Ubuntu user. Google and inform yourself. > Just in case... What I'm trying to say is that I do not know Ubuntu very well, much less its Py2 -> Py3 transition details, and then you should try to find a more authoritative source than my occasional comments about the proper way to do things. -- Lisandro Dalcin ============ Senior Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Oct 14 08:40:43 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 14 Oct 2020 07:40:43 -0600 Subject: [petsc-users] Python version needed for internal scripts In-Reply-To: References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> Message-ID: <87h7qxc4dg.fsf@jedbrown.org> Lisandro Dalcin writes: > On Wed, 14 Oct 2020 at 13:01, Nicolas Barral < > nicolas.barral at math.u-bordeaux.fr> wrote: > >> >> 'python' is usually an alias for python2, so making it point at python3 >> seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and >> maybe others, and if I have no python2 installed, and no 'python' alias, >> I have to manually edit all the scripts. > > apt install python-is-python3 > > and you should get the alias python -> python3 in /usr/bin Most scripts are called through the make system, which uses $(PYTHON). You can always do that: /your/preferred/python petsc_gen_xdmf.py thefile.h5 configure is actually a polyglot shell script that figures out how to call Python on itself. We could do that for other essential scripts. I hate the workflow of petsc_gen_xdmf.py, but it could be done this way. My /bin/python (on Arch) has been Python-3 since 2011. From pranayreddy865 at gmail.com Wed Oct 14 13:59:02 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Wed, 14 Oct 2020 11:59:02 -0700 Subject: [petsc-users] Object is in wrong state, Matrix is missing diagonal entry 0 Message-ID: Hello everyone, I am trying to solve the Poisson equation using SNES class. I am running into a problem where PETSc complains that an object is in wrong state. I opened the matrix object in matlab to see if any diagonal entry is missing but I see that it is not the case. Could you please let me know what I am missing here? The complete error is as follows: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry 0 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-9.agave.rc.asu.edu by pbaikadi Wed Oct 14 11:33:45 2020 [0]PETSC ERROR: Configure options [0]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1687 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/impls/aij/seq/aijfact.c [0]PETSC ERROR: #2 MatILUFactorSymbolic() line 6737 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/interface/matrix.c [0]PETSC ERROR: #3 PCSetUp_ILU() line 144 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/impls/factor/ilu/ilu.c [0]PETSC ERROR: #4 PCSetUp() line 932 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #5 KSPSetUp() line 391 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #6 KSPSolve() line 725 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 225 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/impls/ls/ls.c [0]PETSC ERROR: #8 SNESSolve() line 4560 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/interface/snes.c [0]PETSC ERROR: #9 User provided function() line 0 in User file On a different note, I have two more questions. 1) When a matrix is being filled using MatSetValues, does the nnz vector also have a 0-based indexing? 2) If I explicitly have nnz 1-based indexing, then does nnz(i) indicate the number of nonzeros in the (i-1)^th row or the i^th row? Thank you in advance for your help. Best Regards, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 14 14:53:47 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 14 Oct 2020 15:53:47 -0400 Subject: [petsc-users] Object is in wrong state, Matrix is missing diagonal entry 0 In-Reply-To: References: Message-ID: On Wed, Oct 14, 2020 at 2:59 PM baikadi pranay wrote: > Hello everyone, > I am trying to solve the Poisson equation using SNES class. I am running > into a problem where PETSc complains that an object is in wrong state. I > opened the matrix object in matlab to see if any diagonal entry is missing > but I see that it is not the case. Could you please let me know what I am > missing here? The complete error is as follows: > You are missing the diagonal entry. You can look at the entire structure using MatView(A, PETSC_VIEWER_STDOUT_WORLD); > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry 0 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named > cg17-9.agave.rc.asu.edu by pbaikadi Wed Oct 14 11:33:45 2020 > [0]PETSC ERROR: Configure options > [0]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1687 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/impls/aij/seq/aijfact.c > [0]PETSC ERROR: #2 MatILUFactorSymbolic() line 6737 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/interface/matrix.c > [0]PETSC ERROR: #3 PCSetUp_ILU() line 144 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/impls/factor/ilu/ilu.c > [0]PETSC ERROR: #4 PCSetUp() line 932 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #5 KSPSetUp() line 391 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #6 KSPSolve() line 725 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 225 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #8 SNESSolve() line 4560 in > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/interface/snes.c > [0]PETSC ERROR: #9 User provided function() line 0 in User file > > On a different note, I have two more questions. > 1) When a matrix is being filled using MatSetValues, does the nnz vector > also have a 0-based indexing? > We always use 0-based indices. However, MatSetValues() does not take an nnz vector. > 2) If I explicitly have nnz 1-based indexing, then does nnz(i) indicate > the number of nonzeros in the (i-1)^th row or the i^th row? > I do not understand. Thanks, Matt > Thank you in advance for your help. > Best Regards, > Pranay. > ? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.barral at math.u-bordeaux.fr Wed Oct 14 16:14:19 2020 From: nicolas.barral at math.u-bordeaux.fr (Nicolas Barral) Date: Wed, 14 Oct 2020 23:14:19 +0200 Subject: [petsc-users] Python version needed for internal scripts In-Reply-To: References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> Message-ID: <40342694-dbaf-0e2a-5b2a-3a4858cede09@math.u-bordeaux.fr> Thanks Matt, Pierre, Lisandro and Jed for your help. Does the python version chosen to call the configure script impact other petsc scripts ? For now keeping python as an alias for python2 seems safer (until proven otherwise) due to other codes. @Pierre: I meant the search button in https://lists.mcs.anl.gov/... wouldn't return anything, even on as obvious as queries as "petsc". It does work now, so not sure what happened. Thanks, -- Nicolas On 14/10/2020 12:20, Matthew Knepley wrote: > On Wed, Oct 14, 2020 at 6:01 AM Nicolas Barral > > wrote: > > Hi all, > > Apologies if the question has already been asked, but the ML archive > search seems to be broken (or has it never worked ?). > > Many petsc scripts require a 'python' executable, which python should > that be ? For now, python3 seems to have worked with the configure > scripts and petsc_gen_xdmf scripts, but can I safely assume it will > always be the case ? > > 'python' is usually an alias for python2, so making it point at python3 > seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and > maybe others, and if I have no python2 installed, and no 'python' > alias, > I have to manually edit all the scripts. > > > Right now, PETSc works with both Python2 and Python3. I am not sure how > long we can support Python2, > but the aim is to support it until End of Life, probably on Red Hat > since they change the slowest I think. > > ? Thanks, > > ? ? Matt > > Thanks > > -- > Nicolas > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From jed at jedbrown.org Wed Oct 14 17:31:01 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 14 Oct 2020 16:31:01 -0600 Subject: [petsc-users] Python version needed for internal scripts In-Reply-To: <40342694-dbaf-0e2a-5b2a-3a4858cede09@math.u-bordeaux.fr> References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr> <40342694-dbaf-0e2a-5b2a-3a4858cede09@math.u-bordeaux.fr> Message-ID: <87wnzsbftm.fsf@jedbrown.org> Nicolas Barral writes: > Thanks Matt, Pierre, Lisandro and Jed for your help. > > Does the python version chosen to call the configure script impact other > petsc scripts ? It affects everything called through make targets, which use $(PYTHON) and includes most things a user would care about. petsc_gen_xdmf.py is special and I hate it, but proper Python "install" of this script is a hassle. > For now keeping python as an alias for python2 seems safer (until proven > otherwise) due to other codes. There is no reason for people to have python2 installed unless they have to work with legacy Python that still hasn't migrated after EOL. We test with Python-3 and folks should be encouraged to use it. From olivier.jamond at cea.fr Thu Oct 15 03:26:09 2020 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Thu, 15 Oct 2020 10:26:09 +0200 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> <87tuv48osv.fsf@jedbrown.org> <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev> <218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev> Message-ID: <5d2869c9-3f2b-bb40-c99a-7c47683a1420@cea.fr> > ? We'd be interested in hearing about your convergence and timing > experiences when you run largish jobs (even sequentially) since this > type of problem comes up relatively frequently and we do need a > variety of solvers that can handle it while currently we do not have > great tools for it. Of course I will keep you in touch with the results of this 'ainsworht' approach! I also plan to compare it with the new 'GKB' approach of PCFIELDSPLIT. Many thanks, Olivier From hecbarcab at gmail.com Thu Oct 15 04:16:46 2020 From: hecbarcab at gmail.com (=?UTF-8?Q?H=C3=A9ctor_Barreiro_Cabrera?=) Date: Thu, 15 Oct 2020 11:16:46 +0200 Subject: [petsc-users] Eisenstat-Walker method with GPU assembled matrices Message-ID: Hello fellow PETSc users, Following up my previous email , I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry Smith and Mark Adams!). However, I am now facing problems when trying to use this matrix within a SNES solver with the Eisenstat-Walker method enabled. According to PETSc's error log, the preconditioner is failing to invert the matrix diagonal. Specifically it says that: [0]PETSC ERROR: Arguments are incompatible [0]PETSC ERROR: Zero diagonal on row 0 [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++ --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc" --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single --with-64-bit-indices=0 --with-single-library=1 --with-endian=little --with-debugging=0 --with-x=0 --with-windows-graphics=0 --with-shared-libraries=1 --CUDAOPTFLAGS=-O2 The stack trace leads to the diagonal inversion routine: [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c [0]PETSC ERROR: #3 MatSOR() line 4005 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c [0]PETSC ERROR: #5 PCPreSolve() line 1549 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c [0]PETSC ERROR: #7 KSPSolve() line 889 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c [0]PETSC ERROR: #9 SNESSolve() line 4567 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c I am 100% positive that the diagonal does not contain a zero entry, so my suspicions are either that this operation is not supported on the GPU at all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I am missing some setting to make this work on the GPU. Is this correct? Thanks! Cheers, H?ctor -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Oct 15 08:10:15 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 15 Oct 2020 07:10:15 -0600 Subject: [petsc-users] Eisenstat-Walker method with GPU assembled matrices In-Reply-To: References: Message-ID: <87r1pzbpoo.fsf@jedbrown.org> H?ctor Barreiro Cabrera writes: > Hello fellow PETSc users, > > Following up my previous email > , > I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA > kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry > Smith and Mark Adams!). However, I am now facing problems when trying to > use this matrix within a SNES solver with the Eisenstat-Walker method > enabled. Before going further, the error message makes this look like you're using -pc_type eisenstat (a "trick" to reduce the cost of Gauss-Seidel with Krylov) rather than -snes_ksp_ew (the Eisenstat-Walker method for tuning linear solver tolerances within SNES). Is this what you intend? > According to PETSc's error log, the preconditioner is failing to invert the > matrix diagonal. Specifically it says that: > [0]PETSC ERROR: Arguments are incompatible > [0]PETSC ERROR: Zero diagonal on row 0 > [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release > --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++ > --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc" > --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single > --with-64-bit-indices=0 --with-single-library=1 --with-endian=little > --with-debugging=0 --with-x=0 --with-windows-graphics=0 > --with-shared-libraries=1 --CUDAOPTFLAGS=-O2 > > The stack trace leads to the diagonal inversion routine: > [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c > [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c > [0]PETSC ERROR: #3 MatSOR() line 4005 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c > [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c > [0]PETSC ERROR: #5 PCPreSolve() line 1549 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c > [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c > [0]PETSC ERROR: #7 KSPSolve() line 889 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c > [0]PETSC ERROR: #9 SNESSolve() line 4567 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c > > I am 100% positive that the diagonal does not contain a zero entry, so my > suspicions are either that this operation is not supported on the GPU at > all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I > am missing some setting to make this work on the GPU. Is this correct? > > Thanks! > > Cheers, > H?ctor From hecbarcab at gmail.com Thu Oct 15 10:06:00 2020 From: hecbarcab at gmail.com (=?UTF-8?Q?H=C3=A9ctor_Barreiro_Cabrera?=) Date: Thu, 15 Oct 2020 17:06:00 +0200 Subject: [petsc-users] Eisenstat-Walker method with GPU assembled matrices In-Reply-To: <87r1pzbpoo.fsf@jedbrown.org> References: <87r1pzbpoo.fsf@jedbrown.org> Message-ID: Apologies, I realized that minutes after I hit the send button. My understanding was that both needed to be user together, but from your email it's clear that's not the case. After testing the solver without the preconditioner, but with EW enabled within SNES, everything's working as expected. Thank you, and sorry for the noise! El jue., 15 oct. 2020 a las 15:10, Jed Brown () escribi?: > H?ctor Barreiro Cabrera writes: > > > Hello fellow PETSc users, > > > > Following up my previous email > > < > https://lists.mcs.anl.gov/pipermail/petsc-users/2020-September/042511.html > >, > > I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA > > kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry > > Smith and Mark Adams!). However, I am now facing problems when trying to > > use this matrix within a SNES solver with the Eisenstat-Walker method > > enabled. > > Before going further, the error message makes this look like you're using > -pc_type eisenstat (a "trick" to reduce the cost of Gauss-Seidel with > Krylov) rather than -snes_ksp_ew (the Eisenstat-Walker method for tuning > linear solver tolerances within SNES). Is this what you intend? > > > According to PETSc's error log, the preconditioner is failing to invert > the > > matrix diagonal. Specifically it says that: > > [0]PETSC ERROR: Arguments are incompatible > > [0]PETSC ERROR: Zero diagonal on row 0 > > [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release > > --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++ > > --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc" > > --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single > > --with-64-bit-indices=0 --with-single-library=1 --with-endian=little > > --with-debugging=0 --with-x=0 --with-windows-graphics=0 > > --with-shared-libraries=1 --CUDAOPTFLAGS=-O2 > > > > The stack trace leads to the diagonal inversion routine: > > [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c > > [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c > > [0]PETSC ERROR: #3 MatSOR() line 4005 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c > > [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c > > [0]PETSC ERROR: #5 PCPreSolve() line 1549 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c > > [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c > > [0]PETSC ERROR: #7 KSPSolve() line 889 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c > > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c > > [0]PETSC ERROR: #9 SNESSolve() line 4567 in > > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c > > > > I am 100% positive that the diagonal does not contain a zero entry, so my > > suspicions are either that this operation is not supported on the GPU at > > all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I > > am missing some setting to make this work on the GPU. Is this correct? > > > > Thanks! > > > > Cheers, > > H?ctor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Oct 15 16:32:11 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 15 Oct 2020 16:32:11 -0500 Subject: [petsc-users] Eisenstat-Walker method with GPU assembled matrices In-Reply-To: References: Message-ID: We still have the assumption the AIJ matrix always has a copy on the GPU. How did you fill up the matrix on the GPU while not having its copy on the CPU? Barry When we remove this assumption we have to add a bunch more code for CPU only things to make sure they properly get the data from the GPU. > On Oct 15, 2020, at 4:16 AM, H?ctor Barreiro Cabrera wrote: > > Hello fellow PETSc users, > > Following up my previous email , I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry Smith and Mark Adams!). However, I am now facing problems when trying to use this matrix within a SNES solver with the Eisenstat-Walker method enabled. > > According to PETSc's error log, the preconditioner is failing to invert the matrix diagonal. Specifically it says that: > [0]PETSC ERROR: Arguments are incompatible > [0]PETSC ERROR: Zero diagonal on row 0 > [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++ --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc" --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single --with-64-bit-indices=0 --with-single-library=1 --with-endian=little --with-debugging=0 --with-x=0 --with-windows-graphics=0 --with-shared-libraries=1 --CUDAOPTFLAGS=-O2 > > The stack trace leads to the diagonal inversion routine: > [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c > [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c > [0]PETSC ERROR: #3 MatSOR() line 4005 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c > [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c > [0]PETSC ERROR: #5 PCPreSolve() line 1549 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c > [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c > [0]PETSC ERROR: #7 KSPSolve() line 889 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c > [0]PETSC ERROR: #9 SNESSolve() line 4567 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c > > I am 100% positive that the diagonal does not contain a zero entry, so my suspicions are either that this operation is not supported on the GPU at all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I am missing some setting to make this work on the GPU. Is this correct? > > Thanks! > > Cheers, > H?ctor -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eugenio.Aulisa at ttu.edu Fri Oct 16 06:50:43 2020 From: Eugenio.Aulisa at ttu.edu (Aulisa, Eugenio) Date: Fri, 16 Oct 2020 11:50:43 +0000 Subject: [petsc-users] MatDuplicate Message-ID: Hi, I have a MATMPIAIJ matrix A with an overestimated preallocated size. After closing it I want a duplicate of A with sharp memory allocation for each row both diagonal and off-diagonal. I know how to do it by hand, but I am wondering if a function already exists. For example, if I use MatDuplicate with MAT_COPY_VALUES, will it do a sharp memory allocation, or use the same loose memory allocation of A? Any other function would do the job? Thanks, Eugenio Eugenio Aulisa Department of Mathematics and Statistics, Texas Tech University Lubbock TX, 79409-1042 room: 226 http://www.math.ttu.edu/~eaulisa/ phone: (806) 834-6684 fax: (806) 742-1112 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Oct 16 07:27:17 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 16 Oct 2020 08:27:17 -0400 Subject: [petsc-users] MatDuplicate In-Reply-To: References: Message-ID: On Fri, Oct 16, 2020 at 7:50 AM Aulisa, Eugenio wrote: > Hi, > > I have a MATMPIAIJ matrix A with an overestimated preallocated size. > > After closing it I want a duplicate of A with sharp memory > allocation for each row both diagonal and off-diagonal. > > I know how to do it by hand, but I am wondering if a function already > exists. > > For example, if I use MatDuplicate with MAT_COPY_VALUES > , > > will it do a sharp memory allocation, or use the same loose memory > allocation of A? > It is a sharp allocation. Thanks, Matt > Any other function would do the job? > > Thanks, > Eugenio > > > > > > Eugenio Aulisa > > Department of Mathematics and Statistics, > Texas Tech University > Lubbock TX, 79409-1042 > room: 226 > http://www.math.ttu.edu/~eaulisa/ > phone: (806) 834-6684 > fax: (806) 742-1112 > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexey.V.Kozlov.2 at nd.edu Fri Oct 16 22:47:58 2020 From: Alexey.V.Kozlov.2 at nd.edu (Alexey Kozlov) Date: Fri, 16 Oct 2020 23:47:58 -0400 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: Thank you for your advice! My sparse matrix seems to be very stiff so I have decided to concentrate on the direct solvers. I have very good results with MUMPS. Due to a lack of time I haven?t got a good result with SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a feeling that MUMPS is the best. I have run a sequential test case with built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings and found that MUMPs was about 50 times faster than the built-in LU and used about 3 times less RAM. Do you have any idea why it could be? My test case has about 100,000 complex equations with about 3,000,000 non-zeros. PETSc was compiled with the following options: ./configure --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch. Running MUPMS in parallel using MPI also gave me a significant gain in performance (about 10 times on a single cluster node). Could you, please, advise me whether I can adjust some options for the direct solvers to improve performance? Should I try MUMPS in OpenMP mode? On Sat, Sep 19, 2020 at 7:40 AM Mark Adams wrote: > As Jed said high frequency is hard. AMG, as-is, can be adapted ( > https://link.springer.com/article/10.1007/s00466-006-0047-8) with > parameters. > AMG for convection: use richardson/sor and not chebyshev smoothers and in > smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0). > Mark > > On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov > wrote: > >> Thanks a lot! I'll check them out. >> >> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith wrote: >> >>> >>> These are small enough that likely sparse direct solvers are the best >>> use of your time and for general efficiency. >>> >>> PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and >>> Pastix. I recommend configuring PETSc for all three of them and then >>> comparing them for problems of interest to you. >>> >>> --download-superlu_dist --download-mumps --download-pastix >>> --download-scalapack (used by MUMPS) --download-metis --download-parmetis >>> --download-ptscotch >>> >>> Barry >>> >>> >>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov >>> wrote: >>> >>> Thanks for the tips! My matrix is complex and unsymmetric. My typical >>> test case has of the order of one million equations. I use a 2nd-order >>> finite-difference scheme with 19-point stencil, so my typical test case >>> uses several GB of RAM. >>> >>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown wrote: >>> >>>> Unfortunately, those are hard problems in which the "good" methods are >>>> technical and hard to make black-box. There are "sweeping" methods that >>>> solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, >>>> and fancy multigrid methods. Attempting to solve with STRUMPACK is >>>> probably the easiest thing to try (--download-strumpack). >>>> >>>> >>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html >>>> >>>> Is the matrix complex symmetric? >>>> >>>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D >>>> problem like this if you have enough memory. I'm assuming the memory or >>>> time is unacceptable and you want an iterative method with much lower setup >>>> costs. >>>> >>>> Alexey Kozlov writes: >>>> >>>> > Dear all, >>>> > >>>> > I am solving a convected wave equation in a frequency domain. This >>>> equation >>>> > is a 3D Helmholtz equation with added first-order derivatives and >>>> mixed >>>> > derivatives, and with complex coefficients. The discretized PDE >>>> results in >>>> > a sparse linear system (about 10^6 equations) which is solved in >>>> PETSc. I >>>> > am having difficulty with the code convergence at high frequency, >>>> skewed >>>> > grid, and high Mach number. I suspect it may be due to the >>>> preconditioner I >>>> > use. I am currently using the ILU preconditioner with the number of >>>> fill >>>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the >>>> art >>>> > has evolved and there are better preconditioners for Helmholtz-like >>>> > problems. Could you, please, advise me on a better preconditioner? >>>> > >>>> > Thanks, >>>> > Alexey >>>> > >>>> > -- >>>> > Alexey V. Kozlov >>>> > >>>> > Research Scientist >>>> > Department of Aerospace and Mechanical Engineering >>>> > University of Notre Dame >>>> > >>>> > 117 Hessert Center >>>> > Notre Dame, IN 46556-5684 >>>> > Phone: (574) 631-4335 >>>> > Fax: (574) 631-8355 >>>> > Email: akozlov at nd.edu >>>> >>> >>> >>> -- >>> Alexey V. Kozlov >>> >>> Research Scientist >>> Department of Aerospace and Mechanical Engineering >>> University of Notre Dame >>> >>> 117 Hessert Center >>> Notre Dame, IN 46556-5684 >>> Phone: (574) 631-4335 >>> Fax: (574) 631-8355 >>> Email: akozlov at nd.edu >>> >>> >>> >> >> -- >> Alexey V. Kozlov >> >> Research Scientist >> Department of Aerospace and Mechanical Engineering >> University of Notre Dame >> >> 117 Hessert Center >> Notre Dame, IN 46556-5684 >> Phone: (574) 631-4335 >> Fax: (574) 631-8355 >> Email: akozlov at nd.edu >> > -- Alexey V. Kozlov Research Scientist Department of Aerospace and Mechanical Engineering University of Notre Dame 117 Hessert Center Notre Dame, IN 46556-5684 Phone: (574) 631-4335 Fax: (574) 631-8355 Email: akozlov at nd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Oct 16 23:33:07 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 17 Oct 2020 00:33:07 -0400 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov wrote: > Thank you for your advice! My sparse matrix seems to be very stiff so I > have decided to concentrate on the direct solvers. I have very good results > with MUMPS. Due to a lack of time I haven?t got a good result with > SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a > feeling that MUMPS is the best. I have run a sequential test case with > built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu > -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings > and found that MUMPs was about 50 times faster than the built-in LU and > used about 3 times less RAM. Do you have any idea why it could be? > The numbers do not sound realistic, but of course we do not have your particular problem. In particular, the memory figure seems impossible. > My test case has about 100,000 complex equations with about 3,000,000 > non-zeros. PETSc was compiled with the following options: ./configure > --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g > --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi > --with-scalar-type=complex --with-clanguage=c --with-openmp > --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx > -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2 > -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist > --download-mumps --download-scalapack --download-metis --download-cmake > --download-parmetis --download-ptscotch. > > Running MUPMS in parallel using MPI also gave me a significant gain in > performance (about 10 times on a single cluster node). > Again, this does not appear to make sense. The performance should be limited by memory bandwidth, and a single cluster node will not usually have 10x the bandwidth of a CPU, although it might be possible with a very old CPU. It would help to understand the performance if you would send the output of -log_view. Thanks, Matt > Could you, please, advise me whether I can adjust some options for the > direct solvers to improve performance? Should I try MUMPS in OpenMP mode? > > On Sat, Sep 19, 2020 at 7:40 AM Mark Adams wrote: > >> As Jed said high frequency is hard. AMG, as-is, can be adapted ( >> https://link.springer.com/article/10.1007/s00466-006-0047-8) with >> parameters. >> AMG for convection: use richardson/sor and not chebyshev smoothers and in >> smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0). >> Mark >> >> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov >> wrote: >> >>> Thanks a lot! I'll check them out. >>> >>> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith wrote: >>> >>>> >>>> These are small enough that likely sparse direct solvers are the best >>>> use of your time and for general efficiency. >>>> >>>> PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and >>>> Pastix. I recommend configuring PETSc for all three of them and then >>>> comparing them for problems of interest to you. >>>> >>>> --download-superlu_dist --download-mumps --download-pastix >>>> --download-scalapack (used by MUMPS) --download-metis --download-parmetis >>>> --download-ptscotch >>>> >>>> Barry >>>> >>>> >>>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov >>>> wrote: >>>> >>>> Thanks for the tips! My matrix is complex and unsymmetric. My typical >>>> test case has of the order of one million equations. I use a 2nd-order >>>> finite-difference scheme with 19-point stencil, so my typical test case >>>> uses several GB of RAM. >>>> >>>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown wrote: >>>> >>>>> Unfortunately, those are hard problems in which the "good" methods are >>>>> technical and hard to make black-box. There are "sweeping" methods that >>>>> solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, >>>>> and fancy multigrid methods. Attempting to solve with STRUMPACK is >>>>> probably the easiest thing to try (--download-strumpack). >>>>> >>>>> >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html >>>>> >>>>> Is the matrix complex symmetric? >>>>> >>>>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a >>>>> 3D problem like this if you have enough memory. I'm assuming the memory or >>>>> time is unacceptable and you want an iterative method with much lower setup >>>>> costs. >>>>> >>>>> Alexey Kozlov writes: >>>>> >>>>> > Dear all, >>>>> > >>>>> > I am solving a convected wave equation in a frequency domain. This >>>>> equation >>>>> > is a 3D Helmholtz equation with added first-order derivatives and >>>>> mixed >>>>> > derivatives, and with complex coefficients. The discretized PDE >>>>> results in >>>>> > a sparse linear system (about 10^6 equations) which is solved in >>>>> PETSc. I >>>>> > am having difficulty with the code convergence at high frequency, >>>>> skewed >>>>> > grid, and high Mach number. I suspect it may be due to the >>>>> preconditioner I >>>>> > use. I am currently using the ILU preconditioner with the number of >>>>> fill >>>>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the >>>>> art >>>>> > has evolved and there are better preconditioners for Helmholtz-like >>>>> > problems. Could you, please, advise me on a better preconditioner? >>>>> > >>>>> > Thanks, >>>>> > Alexey >>>>> > >>>>> > -- >>>>> > Alexey V. Kozlov >>>>> > >>>>> > Research Scientist >>>>> > Department of Aerospace and Mechanical Engineering >>>>> > University of Notre Dame >>>>> > >>>>> > 117 Hessert Center >>>>> > Notre Dame, IN 46556-5684 >>>>> > Phone: (574) 631-4335 >>>>> > Fax: (574) 631-8355 >>>>> > Email: akozlov at nd.edu >>>>> >>>> >>>> >>>> -- >>>> Alexey V. Kozlov >>>> >>>> Research Scientist >>>> Department of Aerospace and Mechanical Engineering >>>> University of Notre Dame >>>> >>>> 117 Hessert Center >>>> Notre Dame, IN 46556-5684 >>>> Phone: (574) 631-4335 >>>> Fax: (574) 631-8355 >>>> Email: akozlov at nd.edu >>>> >>>> >>>> >>> >>> -- >>> Alexey V. Kozlov >>> >>> Research Scientist >>> Department of Aerospace and Mechanical Engineering >>> University of Notre Dame >>> >>> 117 Hessert Center >>> Notre Dame, IN 46556-5684 >>> Phone: (574) 631-4335 >>> Fax: (574) 631-8355 >>> Email: akozlov at nd.edu >>> >> > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 16 23:42:22 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 16 Oct 2020 23:42:22 -0500 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: <19C23646-4F73-461A-9D9E-A25FAA99E96A@petsc.dev> > On Oct 16, 2020, at 11:33 PM, Matthew Knepley wrote: > > On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov > wrote: > Thank you for your advice! My sparse matrix seems to be very stiff so I have decided to concentrate on the direct solvers. I have very good results with MUMPS. Due to a lack of time I haven?t got a good result with SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a feeling that MUMPS is the best. I have run a sequential test case with built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings and found that MUMPs was about 50 times faster than the built-in LU and used about 3 times less RAM. Do you have any idea why it could be? > > The numbers do not sound realistic, but of course we do not have your particular problem. In particular, the memory figure seems impossible. They are probably using a different ordering. Remember each external direct solver manages its own orderings and doesn't share even their names. (Not nice community behavior). > > My test case has about 100,000 complex equations with about 3,000,000 non-zeros. PETSc was compiled with the following options: ./configure --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch. > > Running MUPMS in parallel using MPI also gave me a significant gain in performance (about 10 times on a single cluster node). > > Again, this does not appear to make sense. The performance should be limited by memory bandwidth, and a single cluster node will not usually have > 10x the bandwidth of a CPU, although it might be possible with a very old CPU. > > It would help to understand the performance if you would send the output of -log_view. > > Thanks, > > Matt > > Could you, please, advise me whether I can adjust some options for the direct solvers to improve performance? Should I try MUMPS in OpenMP mode? Look at the orderings and other options that MUMPs supports and try them out. Most can be accessed from the command line. You can run with -help to get a real brief summary of them but should read the MUMPs users manual for full details. > > On Sat, Sep 19, 2020 at 7:40 AM Mark Adams > wrote: > As Jed said high frequency is hard. AMG, as-is, can be adapted (https://link.springer.com/article/10.1007/s00466-006-0047-8 ) with parameters. > AMG for convection: use richardson/sor and not chebyshev smoothers and in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0). > Mark > > On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov > wrote: > Thanks a lot! I'll check them out. > > On Sat, Sep 19, 2020 at 1:41 AM Barry Smith > wrote: > > These are small enough that likely sparse direct solvers are the best use of your time and for general efficiency. > > PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and Pastix. I recommend configuring PETSc for all three of them and then comparing them for problems of interest to you. > > --download-superlu_dist --download-mumps --download-pastix --download-scalapack (used by MUMPS) --download-metis --download-parmetis --download-ptscotch > > Barry > > >> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov > wrote: >> >> Thanks for the tips! My matrix is complex and unsymmetric. My typical test case has of the order of one million equations. I use a 2nd-order finite-difference scheme with 19-point stencil, so my typical test case uses several GB of RAM. >> >> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown > wrote: >> Unfortunately, those are hard problems in which the "good" methods are technical and hard to make black-box. There are "sweeping" methods that solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, and fancy multigrid methods. Attempting to solve with STRUMPACK is probably the easiest thing to try (--download-strumpack). >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html >> >> Is the matrix complex symmetric? >> >> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D problem like this if you have enough memory. I'm assuming the memory or time is unacceptable and you want an iterative method with much lower setup costs. >> >> Alexey Kozlov > writes: >> >> > Dear all, >> > >> > I am solving a convected wave equation in a frequency domain. This equation >> > is a 3D Helmholtz equation with added first-order derivatives and mixed >> > derivatives, and with complex coefficients. The discretized PDE results in >> > a sparse linear system (about 10^6 equations) which is solved in PETSc. I >> > am having difficulty with the code convergence at high frequency, skewed >> > grid, and high Mach number. I suspect it may be due to the preconditioner I >> > use. I am currently using the ILU preconditioner with the number of fill >> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art >> > has evolved and there are better preconditioners for Helmholtz-like >> > problems. Could you, please, advise me on a better preconditioner? >> > >> > Thanks, >> > Alexey >> > >> > -- >> > Alexey V. Kozlov >> > >> > Research Scientist >> > Department of Aerospace and Mechanical Engineering >> > University of Notre Dame >> > >> > 117 Hessert Center >> > Notre Dame, IN 46556-5684 >> > Phone: (574) 631-4335 >> > Fax: (574) 631-8355 >> > Email: akozlov at nd.edu >> >> >> -- >> Alexey V. Kozlov >> >> Research Scientist >> Department of Aerospace and Mechanical Engineering >> University of Notre Dame >> >> 117 Hessert Center >> Notre Dame, IN 46556-5684 >> Phone: (574) 631-4335 >> Fax: (574) 631-8355 >> Email: akozlov at nd.edu > > > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu > > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Sat Oct 17 01:52:43 2020 From: pierre at joliv.et (Pierre Jolivet) Date: Sat, 17 Oct 2020 08:52:43 +0200 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: > On 17 Oct 2020, at 5:47 AM, Alexey Kozlov wrote: > > Thank you for your advice! My sparse matrix seems to be very stiff so I have decided to concentrate on the direct solvers. I have very good results with MUMPS. Due to a lack of time I haven?t got a good result with SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a feeling that MUMPS is the best. I have run a sequential test case with built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings and found that MUMPs was about 50 times faster than the built-in LU and used about 3 times less RAM. Do you have any idea why it could be? > > My test case has about 100,000 complex equations with about 3,000,000 non-zeros. PETSc was compiled with the following options: ./configure --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch. > > Running MUPMS in parallel using MPI also gave me a significant gain in performance (about 10 times on a single cluster node). > > Could you, please, advise me whether I can adjust some options for the direct solvers to improve performance? Your problem may be too small, but if you stick to full MUMPS, it may be worth playing around with the block low-rank (BLR) options. Here are some references: http://mumps.enseeiht.fr/doc/Thesis_TheoMary.pdf#page=191 http://mumps.enseeiht.fr/doc/ud_2017/Shantsev_Talk.pdf The relevant options in PETSc are -mat_mumps_icntl_35, -mat_mumps_icntl_36, and -mat_mumps_cntl_7 Thanks, Pierre > Should I try MUMPS in OpenMP mode? > > On Sat, Sep 19, 2020 at 7:40 AM Mark Adams > wrote: > As Jed said high frequency is hard. AMG, as-is, can be adapted (https://link.springer.com/article/10.1007/s00466-006-0047-8 ) with parameters. > AMG for convection: use richardson/sor and not chebyshev smoothers and in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0). > Mark > > On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov > wrote: > Thanks a lot! I'll check them out. > > On Sat, Sep 19, 2020 at 1:41 AM Barry Smith > wrote: > > These are small enough that likely sparse direct solvers are the best use of your time and for general efficiency. > > PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and Pastix. I recommend configuring PETSc for all three of them and then comparing them for problems of interest to you. > > --download-superlu_dist --download-mumps --download-pastix --download-scalapack (used by MUMPS) --download-metis --download-parmetis --download-ptscotch > > Barry > > >> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov > wrote: >> >> Thanks for the tips! My matrix is complex and unsymmetric. My typical test case has of the order of one million equations. I use a 2nd-order finite-difference scheme with 19-point stencil, so my typical test case uses several GB of RAM. >> >> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown > wrote: >> Unfortunately, those are hard problems in which the "good" methods are technical and hard to make black-box. There are "sweeping" methods that solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, and fancy multigrid methods. Attempting to solve with STRUMPACK is probably the easiest thing to try (--download-strumpack). >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html >> >> Is the matrix complex symmetric? >> >> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D problem like this if you have enough memory. I'm assuming the memory or time is unacceptable and you want an iterative method with much lower setup costs. >> >> Alexey Kozlov > writes: >> >> > Dear all, >> > >> > I am solving a convected wave equation in a frequency domain. This equation >> > is a 3D Helmholtz equation with added first-order derivatives and mixed >> > derivatives, and with complex coefficients. The discretized PDE results in >> > a sparse linear system (about 10^6 equations) which is solved in PETSc. I >> > am having difficulty with the code convergence at high frequency, skewed >> > grid, and high Mach number. I suspect it may be due to the preconditioner I >> > use. I am currently using the ILU preconditioner with the number of fill >> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art >> > has evolved and there are better preconditioners for Helmholtz-like >> > problems. Could you, please, advise me on a better preconditioner? >> > >> > Thanks, >> > Alexey >> > >> > -- >> > Alexey V. Kozlov >> > >> > Research Scientist >> > Department of Aerospace and Mechanical Engineering >> > University of Notre Dame >> > >> > 117 Hessert Center >> > Notre Dame, IN 46556-5684 >> > Phone: (574) 631-4335 >> > Fax: (574) 631-8355 >> > Email: akozlov at nd.edu >> >> >> -- >> Alexey V. Kozlov >> >> Research Scientist >> Department of Aerospace and Mechanical Engineering >> University of Notre Dame >> >> 117 Hessert Center >> Notre Dame, IN 46556-5684 >> Phone: (574) 631-4335 >> Fax: (574) 631-8355 >> Email: akozlov at nd.edu > > > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu > > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexey.V.Kozlov.2 at nd.edu Sat Oct 17 04:20:32 2020 From: Alexey.V.Kozlov.2 at nd.edu (Alexey Kozlov) Date: Sat, 17 Oct 2020 05:20:32 -0400 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: Matt, Thank you for your reply! My system has 8 NUMA nodes, so the memory bandwidth can increase up to 8 times when doing parallel computations. In other words, each node of the big computer cluster works as a small cluster consisting of 8 nodes. Of course, this works only if the contribution of communications between the NUMA nodes is small. The total amount of memory on a single cluster node is 128GB, so it is enough to fit my application. Below is the output of -log_view for three cases: (1) BUILT-IN PETSC LU SOLVER ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./caat on a arch-linux-c-opt named d24cepyc110.crc.nd.edu with 1 processor, by akozlov Sat Oct 17 03:58:23 2020 Using 0 OpenMP threads Using Petsc Release Version 3.13.6, unknown Max Max/Min Avg Total Time (sec): 5.551e+03 1.000 5.551e+03 Objects: 1.000e+01 1.000 1.000e+01 Flop: 1.255e+13 1.000 1.255e+13 1.255e+13 Flop/sec: 2.261e+09 1.000 2.261e+09 2.261e+09 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 5.5509e+03 100.0% 1.2551e+13 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatSolve 1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6246 MatLUFactorSym 1 1.0 1.0673e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 5.5350e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 2267 MatAssemblyBegin 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 1.0247e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 1.4306e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.2596e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 4 1.0 9.3985e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyBegin 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 7.3284e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6245 PCSetUp 1 1.0 5.5458e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 2262 PCApply 1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6246 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 2 2 11501999992 0. Vector 2 2 3761520 0. Krylov Solver 1 1 1408 0. Preconditioner 1 1 1184 0. Index Set 3 3 1412088 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 7.15256e-08 #PETSc Option Table entries: -ksp_type preonly -log_view -pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch ----------------------------------------- Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu Machine characteristics: Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc Using PETSc arch: arch-linux-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp ----------------------------------------- Using include paths: -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include -I/opt/crc/v/valgrind/3.14/ompi/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/afs/ crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc -Wl,-rpath,/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 -L/opt/crc/i/intel/19.0/mkl/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/ crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin -L/afs/ crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl ----------------------------------------- (2) EXTERNAL PACKAGE MUMPS, 1 MPI PROCESS ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./caat on a arch-linux-c-opt named d24cepyc068.crc.nd.edu with 1 processor, by akozlov Sat Oct 17 01:55:20 2020 Using 0 OpenMP threads Using Petsc Release Version 3.13.6, unknown Max Max/Min Avg Total Time (sec): 1.075e+02 1.000 1.075e+02 Objects: 9.000e+00 1.000 9.000e+00 Flop: 1.959e+12 1.000 1.959e+12 1.959e+12 Flop/sec: 1.823e+10 1.000 1.823e+10 1.823e+10 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.0747e+02 100.0% 1.9594e+12 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatSolve 1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 0.0e+00 0100 0 0 0 0100 0 0 0 6126201 MatLUFactorSym 1 1.0 2.3141e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatLUFactorNum 1 1.0 1.0001e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00 0.0e+00 93 0 0 0 0 93 0 0 0 0 12 MatAssemblyBegin 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 1.0067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 1.8650e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.3029e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecCopy 1 1.0 1.0943e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 4 1.0 9.2626e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyBegin 2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 3.1981e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 0.0e+00 0100 0 0 0 0100 0 0 0 6123146 PCSetUp 1 1.0 1.0251e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 0 0 0 0 95 0 0 0 0 11 PCApply 1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 0.0e+00 0100 0 0 0 0100 0 0 0 6126096 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 2 2 59441612 0. Vector 2 2 3761520 0. Krylov Solver 1 1 1408 0. Preconditioner 1 1 1184 0. Index Set 2 2 941392 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 4.76837e-08 #PETSc Option Table entries: -ksp_type preonly -log_view -pc_factor_mat_solver_type mumps -pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch ----------------------------------------- Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu Machine characteristics: Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc Using PETSc arch: arch-linux-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp ----------------------------------------- Using include paths: -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include -I/opt/crc/v/valgrind/3.14/ompi/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/afs/ crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc -Wl,-rpath,/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 -L/opt/crc/i/intel/19.0/mkl/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/ crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin -L/afs/ crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl ----------------------------------------- (3) EXTERNAL PACKAGE MUMPS , 48 MPI PROCESSES ON A SINGLE CLUSTER NODE WITH 8 NUMA NODES ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./caat on a arch-linux-c-opt named d24cepyc069.crc.nd.edu with 48 processors, by akozlov Sat Oct 17 04:40:25 2020 Using 0 OpenMP threads Using Petsc Release Version 3.13.6, unknown Max Max/Min Avg Total Time (sec): 1.415e+01 1.000 1.415e+01 Objects: 3.000e+01 1.000 3.000e+01 Flop: 4.855e+10 1.637 4.084e+10 1.960e+12 Flop/sec: 3.431e+09 1.637 2.886e+09 1.385e+11 MPI Messages: 1.180e+02 2.682 8.169e+01 3.921e+03 MPI Message Lengths: 1.559e+05 5.589 1.238e+03 4.855e+06 MPI Reductions: 4.000e+01 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.4150e+01 100.0% 1.9602e+12 100.0% 3.921e+03 100.0% 1.238e+03 100.0% 3.100e+01 77.5% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 5 1.0 1.0707e-02 3.3 0.00e+00 0.0 7.8e+02 4.0e+00 5.0e+00 0 0 20 0 12 0 0 20 0 16 0 BuildTwoSidedF 3 1.0 8.6837e-03 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 8 0 0 0 0 10 0 MatSolve 1 1.0 6.6314e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 6.0e+00 0100 90 87 15 0100 90 87 19 29529617 MatLUFactorSym 1 1.0 2.4322e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 17 0 0 0 10 17 0 0 0 13 0 MatLUFactorNum 1 1.0 5.8816e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00 0.0e+00 42 0 0 0 0 42 0 0 0 0 332 MatAssemblyBegin 1 1.0 7.3917e-0357.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 2 0 0 0 0 3 0 MatAssemblyEnd 1 1.0 2.5823e-02 1.0 0.00e+00 0.0 3.8e+02 1.6e+03 5.0e+00 0 0 10 13 12 0 0 10 13 16 0 MatGetRowIJ 1 1.0 3.5763e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 9.2506e-05 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 4 1.0 5.3000e-0460.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyBegin 2 1.0 2.2390e-0319.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 5 0 0 0 0 6 0 VecAssemblyEnd 2 1.0 9.7752e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 2 1.0 1.6036e-0312.8 0.00e+00 0.0 5.9e+02 4.8e+03 1.0e+00 0 0 15 58 2 0 0 15 58 3 0 VecScatterEnd 2 1.0 2.0087e-0338.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 2 1.0 1.5259e-05 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 3 1.0 3.3023e-03 2.9 0.00e+00 0.0 1.6e+03 7.0e+02 2.0e+00 0 0 40 23 5 0 0 40 23 6 0 SFBcastOpBegin 2 1.0 1.5953e-0313.7 0.00e+00 0.0 5.9e+02 4.8e+03 1.0e+00 0 0 15 58 2 0 0 15 58 3 0 SFBcastOpEnd 2 1.0 2.0008e-0345.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFPack 2 1.0 1.4646e-03361.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 2 1.0 4.1723e-0529.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 6.6350e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 6.0e+00 0100 90 87 15 0100 90 87 19 29513594 PCSetUp 1 1.0 8.4679e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00 1.0e+01 60 0 0 0 25 60 0 0 0 32 230 PCApply 1 1.0 6.6319e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 6.0e+00 0100 90 87 15 0100 90 87 19 29527282 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 4 4 1224428 0. Vec Scatter 3 3 2400 0. Vector 8 8 1923424 0. Index Set 9 9 32392 0. Star Forest Graph 3 3 3376 0. Krylov Solver 1 1 1408 0. Preconditioner 1 1 1160 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 7.15256e-08 Average time for MPI_Barrier(): 3.48091e-06 Average time for zero size MPI_Send(): 2.49843e-06 #PETSc Option Table entries: -ksp_type preonly -log_view -pc_factor_mat_solver_type mumps -pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4 Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch ----------------------------------------- Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu Machine characteristics: Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc Using PETSc arch: arch-linux-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2 -fopenmp ----------------------------------------- Using include paths: -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include -I/opt/crc/v/valgrind/3.14/ompi/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/afs/ crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc -Wl,-rpath,/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 -L/opt/crc/i/intel/19.0/mkl/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/ crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin -L/afs/ crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl ----------------------------------------- On Sat, Oct 17, 2020 at 12:33 AM Matthew Knepley wrote: > On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov > wrote: > >> Thank you for your advice! My sparse matrix seems to be very stiff so I >> have decided to concentrate on the direct solvers. I have very good results >> with MUMPS. Due to a lack of time I haven?t got a good result with >> SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a >> feeling that MUMPS is the best. I have run a sequential test case with >> built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu >> -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings >> and found that MUMPs was about 50 times faster than the built-in LU and >> used about 3 times less RAM. Do you have any idea why it could be? >> > The numbers do not sound realistic, but of course we do not have your > particular problem. In particular, the memory figure seems impossible. > >> My test case has about 100,000 complex equations with about 3,000,000 >> non-zeros. PETSc was compiled with the following options: ./configure >> --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g >> --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi >> --with-scalar-type=complex --with-clanguage=c --with-openmp >> --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 >> -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx >> -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2 >> -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist >> --download-mumps --download-scalapack --download-metis --download-cmake >> --download-parmetis --download-ptscotch. >> >> Running MUPMS in parallel using MPI also gave me a significant gain in >> performance (about 10 times on a single cluster node). >> > Again, this does not appear to make sense. The performance should be > limited by memory bandwidth, and a single cluster node will not usually have > 10x the bandwidth of a CPU, although it might be possible with a very old > CPU. > > It would help to understand the performance if you would send the output > of -log_view. > > Thanks, > > Matt > >> Could you, please, advise me whether I can adjust some options for the >> direct solvers to improve performance? Should I try MUMPS in OpenMP mode? >> >> On Sat, Sep 19, 2020 at 7:40 AM Mark Adams wrote: >> >>> As Jed said high frequency is hard. AMG, as-is, can be adapted ( >>> https://link.springer.com/article/10.1007/s00466-006-0047-8) with >>> parameters. >>> AMG for convection: use richardson/sor and not chebyshev smoothers and >>> in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0). >>> Mark >>> >>> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov >>> wrote: >>> >>>> Thanks a lot! I'll check them out. >>>> >>>> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith wrote: >>>> >>>>> >>>>> These are small enough that likely sparse direct solvers are the >>>>> best use of your time and for general efficiency. >>>>> >>>>> PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and >>>>> Pastix. I recommend configuring PETSc for all three of them and then >>>>> comparing them for problems of interest to you. >>>>> >>>>> --download-superlu_dist --download-mumps --download-pastix >>>>> --download-scalapack (used by MUMPS) --download-metis --download-parmetis >>>>> --download-ptscotch >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov >>>>> wrote: >>>>> >>>>> Thanks for the tips! My matrix is complex and unsymmetric. My typical >>>>> test case has of the order of one million equations. I use a 2nd-order >>>>> finite-difference scheme with 19-point stencil, so my typical test case >>>>> uses several GB of RAM. >>>>> >>>>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown wrote: >>>>> >>>>>> Unfortunately, those are hard problems in which the "good" methods >>>>>> are technical and hard to make black-box. There are "sweeping" methods >>>>>> that solve on 2D "slabs" with PML boundary conditions, H-matrix based >>>>>> methods, and fancy multigrid methods. Attempting to solve with STRUMPACK >>>>>> is probably the easiest thing to try (--download-strumpack). >>>>>> >>>>>> >>>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html >>>>>> >>>>>> Is the matrix complex symmetric? >>>>>> >>>>>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a >>>>>> 3D problem like this if you have enough memory. I'm assuming the memory or >>>>>> time is unacceptable and you want an iterative method with much lower setup >>>>>> costs. >>>>>> >>>>>> Alexey Kozlov writes: >>>>>> >>>>>> > Dear all, >>>>>> > >>>>>> > I am solving a convected wave equation in a frequency domain. This >>>>>> equation >>>>>> > is a 3D Helmholtz equation with added first-order derivatives and >>>>>> mixed >>>>>> > derivatives, and with complex coefficients. The discretized PDE >>>>>> results in >>>>>> > a sparse linear system (about 10^6 equations) which is solved in >>>>>> PETSc. I >>>>>> > am having difficulty with the code convergence at high frequency, >>>>>> skewed >>>>>> > grid, and high Mach number. I suspect it may be due to the >>>>>> preconditioner I >>>>>> > use. I am currently using the ILU preconditioner with the number of >>>>>> fill >>>>>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of >>>>>> the art >>>>>> > has evolved and there are better preconditioners for Helmholtz-like >>>>>> > problems. Could you, please, advise me on a better preconditioner? >>>>>> > >>>>>> > Thanks, >>>>>> > Alexey >>>>>> > >>>>>> > -- >>>>>> > Alexey V. Kozlov >>>>>> > >>>>>> > Research Scientist >>>>>> > Department of Aerospace and Mechanical Engineering >>>>>> > University of Notre Dame >>>>>> > >>>>>> > 117 Hessert Center >>>>>> > Notre Dame, IN 46556-5684 >>>>>> > Phone: (574) 631-4335 >>>>>> > Fax: (574) 631-8355 >>>>>> > Email: akozlov at nd.edu >>>>>> >>>>> >>>>> >>>>> -- >>>>> Alexey V. Kozlov >>>>> >>>>> Research Scientist >>>>> Department of Aerospace and Mechanical Engineering >>>>> University of Notre Dame >>>>> >>>>> 117 Hessert Center >>>>> Notre Dame, IN 46556-5684 >>>>> Phone: (574) 631-4335 >>>>> Fax: (574) 631-8355 >>>>> Email: akozlov at nd.edu >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Alexey V. Kozlov >>>> >>>> Research Scientist >>>> Department of Aerospace and Mechanical Engineering >>>> University of Notre Dame >>>> >>>> 117 Hessert Center >>>> Notre Dame, IN 46556-5684 >>>> Phone: (574) 631-4335 >>>> Fax: (574) 631-8355 >>>> Email: akozlov at nd.edu >>>> >>> >> >> -- >> Alexey V. Kozlov >> >> Research Scientist >> Department of Aerospace and Mechanical Engineering >> University of Notre Dame >> >> 117 Hessert Center >> Notre Dame, IN 46556-5684 >> Phone: (574) 631-4335 >> Fax: (574) 631-8355 >> Email: akozlov at nd.edu >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Alexey V. Kozlov Research Scientist Department of Aerospace and Mechanical Engineering University of Notre Dame 117 Hessert Center Notre Dame, IN 46556-5684 Phone: (574) 631-4335 Fax: (574) 631-8355 Email: akozlov at nd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Oct 17 07:41:51 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 17 Oct 2020 08:41:51 -0400 Subject: [petsc-users] Preconditioner for Helmholtz-like problem In-Reply-To: References: <87o8m2tod8.fsf@jedbrown.org> Message-ID: On Sat, Oct 17, 2020 at 5:21 AM Alexey Kozlov wrote: > Matt, > > Thank you for your reply! > My system has 8 NUMA nodes, so the memory bandwidth can increase up to 8 > times when doing parallel computations. In other words, each node of the > big computer cluster works as a small cluster consisting of 8 nodes. Of > course, this works only if the contribution of communications between the > NUMA nodes is small. The total amount of memory on a single cluster node is > 128GB, so it is enough to fit my application. > Barry is right, of course. We can see that the PETSc LU, using the natural ordering, is doing 10,000x flops compared to MUMPS. Using the same ordering, MUMPS might still benefit from blocking, but the gap would be much much smaller. I misunderstood your description of the parallelism. Yes, using 8 nodes you could see 8x from one node. I think Pierre is correct that something related to the size is happening since the numeric factorization in the parallel case for MUMPS is running at 30x the flop rate of the serial case. Its possible that they are using a different ordering in parallel that does more flope, but is more amenable to vectorization. It is hard to know without reporting all the MUMPS options. Thanks, Matt > Below is the output of -log_view for three cases: > (1) BUILT-IN PETSC LU SOLVER > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./caat on a arch-linux-c-opt named d24cepyc110.crc.nd.edu with 1 > processor, by akozlov Sat Oct 17 03:58:23 2020 > Using 0 OpenMP threads > Using Petsc Release Version 3.13.6, unknown > > Max Max/Min Avg Total > Time (sec): 5.551e+03 1.000 5.551e+03 > Objects: 1.000e+01 1.000 1.000e+01 > Flop: 1.255e+13 1.000 1.255e+13 1.255e+13 > Flop/sec: 2.261e+09 1.000 2.261e+09 2.261e+09 > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flop > and VecAXPY() for complex vectors of length N > --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 5.5509e+03 100.0% 1.2551e+13 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatSolve 1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 6246 > MatLUFactorSym 1 1.0 1.0673e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatLUFactorNum 1 1.0 5.5350e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00 > 0.0e+00100100 0 0 0 100100 0 0 0 2267 > MatAssemblyBegin 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 1.0247e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 1 1.0 1.4306e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.2596e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 4 1.0 9.3985e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyBegin 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 7.3284e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 6245 > PCSetUp 1 1.0 5.5458e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00 > 0.0e+00100100 0 0 0 100100 0 0 0 2262 > PCApply 1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 6246 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 2 2 11501999992 0. > Vector 2 2 3761520 0. > Krylov Solver 1 1 1408 0. > Preconditioner 1 1 1184 0. > Index Set 3 3 1412088 0. > Viewer 1 0 0 0. > > ======================================================================================================================== > Average time to get PetscTime(): 7.15256e-08 > #PETSc Option Table entries: > -ksp_type preonly > -log_view > -pc_type lu > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 16 sizeof(PetscInt) 4 > Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl > --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi > --with-scalar-type=complex --with-clanguage=c --with-openmp > --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx > -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 > -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist > --download-mumps --download-scalapack --download-metis --download-cmake > --download-parmetis --download-ptscotch > ----------------------------------------- > Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu > Machine characteristics: > Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo > Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc > Using PETSc arch: arch-linux-c-opt > ----------------------------------------- > > Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2 -fopenmp > Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2 -fopenmp > ----------------------------------------- > > Using include paths: -I/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include > -I/opt/crc/v/valgrind/3.14/ompi/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: -Wl,-rpath,/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc > -Wl,-rpath,/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl > -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib > -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib > -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 > -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 > -L/opt/crc/i/intel/19.0/mkl/lib/intel64 > -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 > -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 > -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/ > crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin > -L/afs/ > crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin > -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps > -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 > -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis > -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis > -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread > -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc > -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > (2) EXTERNAL PACKAGE MUMPS, 1 MPI PROCESS > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./caat on a arch-linux-c-opt named d24cepyc068.crc.nd.edu with 1 > processor, by akozlov Sat Oct 17 01:55:20 2020 > Using 0 OpenMP threads > Using Petsc Release Version 3.13.6, unknown > > Max Max/Min Avg Total > Time (sec): 1.075e+02 1.000 1.075e+02 > Objects: 9.000e+00 1.000 9.000e+00 > Flop: 1.959e+12 1.000 1.959e+12 1.959e+12 > Flop/sec: 1.823e+10 1.000 1.823e+10 1.823e+10 > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flop > and VecAXPY() for complex vectors of length N > --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 1.0747e+02 100.0% 1.9594e+12 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > MatSolve 1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 > 0.0e+00 0100 0 0 0 0100 0 0 0 6126201 > MatLUFactorSym 1 1.0 2.3141e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > MatLUFactorNum 1 1.0 1.0001e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 93 0 0 0 0 93 0 0 0 0 12 > MatAssemblyBegin 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 1.0067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 1 1.0 1.8650e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.3029e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecCopy 1 1.0 1.0943e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 4 1.0 9.2626e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyBegin 2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 3.1981e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 > 0.0e+00 0100 0 0 0 0100 0 0 0 6123146 > PCSetUp 1 1.0 1.0251e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 95 0 0 0 0 95 0 0 0 0 11 > PCApply 1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00 > 0.0e+00 0100 0 0 0 0100 0 0 0 6126096 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 2 2 59441612 0. > Vector 2 2 3761520 0. > Krylov Solver 1 1 1408 0. > Preconditioner 1 1 1184 0. > Index Set 2 2 941392 0. > Viewer 1 0 0 0. > > ======================================================================================================================== > Average time to get PetscTime(): 4.76837e-08 > #PETSc Option Table entries: > -ksp_type preonly > -log_view > -pc_factor_mat_solver_type mumps > -pc_type lu > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 16 sizeof(PetscInt) 4 > Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl > --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi > --with-scalar-type=complex --with-clanguage=c --with-openmp > --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx > -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 > -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist > --download-mumps --download-scalapack --download-metis --download-cmake > --download-parmetis --download-ptscotch > ----------------------------------------- > Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu > Machine characteristics: > Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo > Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc > Using PETSc arch: arch-linux-c-opt > ----------------------------------------- > > Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2 -fopenmp > Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2 -fopenmp > ----------------------------------------- > > Using include paths: -I/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include > -I/opt/crc/v/valgrind/3.14/ompi/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: -Wl,-rpath,/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc > -Wl,-rpath,/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl > -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib > -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib > -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 > -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 > -L/opt/crc/i/intel/19.0/mkl/lib/intel64 > -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 > -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 > -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/ > crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin > -L/afs/ > crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin > -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps > -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 > -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis > -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis > -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread > -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc > -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > (3) EXTERNAL PACKAGE MUMPS , 48 MPI PROCESSES ON A SINGLE CLUSTER NODE > WITH 8 NUMA NODES > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./caat on a arch-linux-c-opt named d24cepyc069.crc.nd.edu with 48 > processors, by akozlov Sat Oct 17 04:40:25 2020 > Using 0 OpenMP threads > Using Petsc Release Version 3.13.6, unknown > > Max Max/Min Avg Total > Time (sec): 1.415e+01 1.000 1.415e+01 > Objects: 3.000e+01 1.000 3.000e+01 > Flop: 4.855e+10 1.637 4.084e+10 1.960e+12 > Flop/sec: 3.431e+09 1.637 2.886e+09 1.385e+11 > MPI Messages: 1.180e+02 2.682 8.169e+01 3.921e+03 > MPI Message Lengths: 1.559e+05 5.589 1.238e+03 4.855e+06 > MPI Reductions: 4.000e+01 1.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flop > and VecAXPY() for complex vectors of length N > --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 1.4150e+01 100.0% 1.9602e+12 100.0% 3.921e+03 > 100.0% 1.238e+03 100.0% 3.100e+01 77.5% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > BuildTwoSided 5 1.0 1.0707e-02 3.3 0.00e+00 0.0 7.8e+02 4.0e+00 > 5.0e+00 0 0 20 0 12 0 0 20 0 16 0 > BuildTwoSidedF 3 1.0 8.6837e-03 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+00 0 0 0 0 8 0 0 0 0 10 0 > MatSolve 1 1.0 6.6314e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 > 6.0e+00 0100 90 87 15 0100 90 87 19 29529617 > MatLUFactorSym 1 1.0 2.4322e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 4.0e+00 17 0 0 0 10 17 0 0 0 13 0 > MatLUFactorNum 1 1.0 5.8816e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00 > 0.0e+00 42 0 0 0 0 42 0 0 0 0 332 > MatAssemblyBegin 1 1.0 7.3917e-0357.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+00 0 0 0 0 2 0 0 0 0 3 0 > MatAssemblyEnd 1 1.0 2.5823e-02 1.0 0.00e+00 0.0 3.8e+02 1.6e+03 > 5.0e+00 0 0 10 13 12 0 0 10 13 16 0 > MatGetRowIJ 1 1.0 3.5763e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 9.2506e-05 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 4 1.0 5.3000e-0460.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyBegin 2 1.0 2.2390e-0319.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 5 0 0 0 0 6 0 > VecAssemblyEnd 2 1.0 9.7752e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 2 1.0 1.6036e-0312.8 0.00e+00 0.0 5.9e+02 4.8e+03 > 1.0e+00 0 0 15 58 2 0 0 15 58 3 0 > VecScatterEnd 2 1.0 2.0087e-0338.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SFSetGraph 2 1.0 1.5259e-05 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SFSetUp 3 1.0 3.3023e-03 2.9 0.00e+00 0.0 1.6e+03 7.0e+02 > 2.0e+00 0 0 40 23 5 0 0 40 23 6 0 > SFBcastOpBegin 2 1.0 1.5953e-0313.7 0.00e+00 0.0 5.9e+02 4.8e+03 > 1.0e+00 0 0 15 58 2 0 0 15 58 3 0 > SFBcastOpEnd 2 1.0 2.0008e-0345.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SFPack 2 1.0 1.4646e-03361.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SFUnpack 2 1.0 4.1723e-0529.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetUp 1 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 6.6350e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 > 6.0e+00 0100 90 87 15 0100 90 87 19 29513594 > PCSetUp 1 1.0 8.4679e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00 > 1.0e+01 60 0 0 0 25 60 0 0 0 32 230 > PCApply 1 1.0 6.6319e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03 > 6.0e+00 0100 90 87 15 0100 90 87 19 29527282 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Matrix 4 4 1224428 0. > Vec Scatter 3 3 2400 0. > Vector 8 8 1923424 0. > Index Set 9 9 32392 0. > Star Forest Graph 3 3 3376 0. > Krylov Solver 1 1 1408 0. > Preconditioner 1 1 1160 0. > Viewer 1 0 0 0. > > ======================================================================================================================== > Average time to get PetscTime(): 7.15256e-08 > Average time for MPI_Barrier(): 3.48091e-06 > Average time for zero size MPI_Send(): 2.49843e-06 > #PETSc Option Table entries: > -ksp_type preonly > -log_view > -pc_factor_mat_solver_type mumps > -pc_type lu > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 16 sizeof(PetscInt) 4 > Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl > --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi > --with-scalar-type=complex --with-clanguage=c --with-openmp > --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx > -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2 > -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist > --download-mumps --download-scalapack --download-metis --download-cmake > --download-parmetis --download-ptscotch > ----------------------------------------- > Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu > Machine characteristics: > Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo > Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc > Using PETSc arch: arch-linux-c-opt > ----------------------------------------- > > Using C compiler: mpicc -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2 -fopenmp > Using Fortran compiler: mpif90 -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2 > -no-prec-div -fp-model fast=2 -fopenmp > ----------------------------------------- > > Using include paths: -I/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include > -I/opt/crc/v/valgrind/3.14/ompi/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: -Wl,-rpath,/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc > -Wl,-rpath,/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/ > crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl > -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib > -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib > -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 > -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7 > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64 > -L/opt/crc/i/intel/19.0/mkl/lib/intel64 > -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64 > -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64 > -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/ > crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin > -L/afs/ > crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin > -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin > -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 > -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps > -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64 > -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis > -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis > -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread > -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc > -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > > On Sat, Oct 17, 2020 at 12:33 AM Matthew Knepley > wrote: > >> On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov >> wrote: >> >>> Thank you for your advice! My sparse matrix seems to be very stiff so I >>> have decided to concentrate on the direct solvers. I have very good results >>> with MUMPS. Due to a lack of time I haven?t got a good result with >>> SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a >>> feeling that MUMPS is the best. I have run a sequential test case with >>> built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu >>> -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings >>> and found that MUMPs was about 50 times faster than the built-in LU and >>> used about 3 times less RAM. Do you have any idea why it could be? >>> >> The numbers do not sound realistic, but of course we do not have your >> particular problem. In particular, the memory figure seems impossible. >> >>> My test case has about 100,000 complex equations with about 3,000,000 >>> non-zeros. PETSc was compiled with the following options: ./configure >>> --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g >>> --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi >>> --with-scalar-type=complex --with-clanguage=c --with-openmp >>> --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 >>> -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx >>> -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2 >>> -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist >>> --download-mumps --download-scalapack --download-metis --download-cmake >>> --download-parmetis --download-ptscotch. >>> >>> Running MUPMS in parallel using MPI also gave me a significant gain in >>> performance (about 10 times on a single cluster node). >>> >> Again, this does not appear to make sense. The performance should be >> limited by memory bandwidth, and a single cluster node will not usually have >> 10x the bandwidth of a CPU, although it might be possible with a very old >> CPU. >> >> It would help to understand the performance if you would send the output >> of -log_view. >> >> Thanks, >> >> Matt >> >>> Could you, please, advise me whether I can adjust some options for the >>> direct solvers to improve performance? Should I try MUMPS in OpenMP mode? >>> >>> On Sat, Sep 19, 2020 at 7:40 AM Mark Adams wrote: >>> >>>> As Jed said high frequency is hard. AMG, as-is, can be adapted ( >>>> https://link.springer.com/article/10.1007/s00466-006-0047-8) with >>>> parameters. >>>> AMG for convection: use richardson/sor and not chebyshev smoothers and >>>> in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0). >>>> Mark >>>> >>>> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov >>>> wrote: >>>> >>>>> Thanks a lot! I'll check them out. >>>>> >>>>> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith wrote: >>>>> >>>>>> >>>>>> These are small enough that likely sparse direct solvers are the >>>>>> best use of your time and for general efficiency. >>>>>> >>>>>> PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and >>>>>> Pastix. I recommend configuring PETSc for all three of them and then >>>>>> comparing them for problems of interest to you. >>>>>> >>>>>> --download-superlu_dist --download-mumps --download-pastix >>>>>> --download-scalapack (used by MUMPS) --download-metis --download-parmetis >>>>>> --download-ptscotch >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov >>>>>> wrote: >>>>>> >>>>>> Thanks for the tips! My matrix is complex and unsymmetric. My typical >>>>>> test case has of the order of one million equations. I use a 2nd-order >>>>>> finite-difference scheme with 19-point stencil, so my typical test case >>>>>> uses several GB of RAM. >>>>>> >>>>>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown wrote: >>>>>> >>>>>>> Unfortunately, those are hard problems in which the "good" methods >>>>>>> are technical and hard to make black-box. There are "sweeping" methods >>>>>>> that solve on 2D "slabs" with PML boundary conditions, H-matrix based >>>>>>> methods, and fancy multigrid methods. Attempting to solve with STRUMPACK >>>>>>> is probably the easiest thing to try (--download-strumpack). >>>>>>> >>>>>>> >>>>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html >>>>>>> >>>>>>> Is the matrix complex symmetric? >>>>>>> >>>>>>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a >>>>>>> 3D problem like this if you have enough memory. I'm assuming the memory or >>>>>>> time is unacceptable and you want an iterative method with much lower setup >>>>>>> costs. >>>>>>> >>>>>>> Alexey Kozlov writes: >>>>>>> >>>>>>> > Dear all, >>>>>>> > >>>>>>> > I am solving a convected wave equation in a frequency domain. This >>>>>>> equation >>>>>>> > is a 3D Helmholtz equation with added first-order derivatives and >>>>>>> mixed >>>>>>> > derivatives, and with complex coefficients. The discretized PDE >>>>>>> results in >>>>>>> > a sparse linear system (about 10^6 equations) which is solved in >>>>>>> PETSc. I >>>>>>> > am having difficulty with the code convergence at high frequency, >>>>>>> skewed >>>>>>> > grid, and high Mach number. I suspect it may be due to the >>>>>>> preconditioner I >>>>>>> > use. I am currently using the ILU preconditioner with the number >>>>>>> of fill >>>>>>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of >>>>>>> the art >>>>>>> > has evolved and there are better preconditioners for Helmholtz-like >>>>>>> > problems. Could you, please, advise me on a better preconditioner? >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Alexey >>>>>>> > >>>>>>> > -- >>>>>>> > Alexey V. Kozlov >>>>>>> > >>>>>>> > Research Scientist >>>>>>> > Department of Aerospace and Mechanical Engineering >>>>>>> > University of Notre Dame >>>>>>> > >>>>>>> > 117 Hessert Center >>>>>>> > Notre Dame, IN 46556-5684 >>>>>>> > Phone: (574) 631-4335 >>>>>>> > Fax: (574) 631-8355 >>>>>>> > Email: akozlov at nd.edu >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Alexey V. Kozlov >>>>>> >>>>>> Research Scientist >>>>>> Department of Aerospace and Mechanical Engineering >>>>>> University of Notre Dame >>>>>> >>>>>> 117 Hessert Center >>>>>> Notre Dame, IN 46556-5684 >>>>>> Phone: (574) 631-4335 >>>>>> Fax: (574) 631-8355 >>>>>> Email: akozlov at nd.edu >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Alexey V. Kozlov >>>>> >>>>> Research Scientist >>>>> Department of Aerospace and Mechanical Engineering >>>>> University of Notre Dame >>>>> >>>>> 117 Hessert Center >>>>> Notre Dame, IN 46556-5684 >>>>> Phone: (574) 631-4335 >>>>> Fax: (574) 631-8355 >>>>> Email: akozlov at nd.edu >>>>> >>>> >>> >>> -- >>> Alexey V. Kozlov >>> >>> Research Scientist >>> Department of Aerospace and Mechanical Engineering >>> University of Notre Dame >>> >>> 117 Hessert Center >>> Notre Dame, IN 46556-5684 >>> Phone: (574) 631-4335 >>> Fax: (574) 631-8355 >>> Email: akozlov at nd.edu >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > > -- > Alexey V. Kozlov > > Research Scientist > Department of Aerospace and Mechanical Engineering > University of Notre Dame > > 117 Hessert Center > Notre Dame, IN 46556-5684 > Phone: (574) 631-4335 > Fax: (574) 631-8355 > Email: akozlov at nd.edu > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From namala2 at illinois.edu Sun Oct 18 14:55:34 2020 From: namala2 at illinois.edu (Namala, Solomon) Date: Sun, 18 Oct 2020 19:55:34 +0000 Subject: [petsc-users] Guidelines for Nested fieldsplit for domains with a hybrid mesh Message-ID: Hello, I am working to solve Stokes problem on a domain that is discretized using two different types of mesh. A part of the mesh uses fem formulation and the rest uses nodal integral method (NIM) formulation (the details of which I will skip). However, the key takeaway is that NIM formulation of stokes uses pressure Poisson formulation instead of the continuity equation while FEM formulation uses the continuity equation. They are coupled at the interface. Right now, I am building a single matrix for the entire domain and solving it using fieldsplit option in a nested fashion. The matrix structure and the unknown vector are shown below. My questions are: * Are there any basic guidelines to solve these kind of problems. * As I have mentioned I am currently using nested fieldsplit. The first split is using indices and the other split is done using detect saddle point option. is there a way to avoid using that option and doing it by combining set of indices or fields. The matrix structure is [Au_fem Bp_fem 0 0] [Cu_fem. 0 0 0] [0 0 Du_nim Ep_nim] [0 0 0 Fp_nim] the unknown vector is given by [ufem pfem unim pnim] Let me know if any additional information is needed. Thanks, Solomon. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Oct 18 16:32:57 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 18 Oct 2020 16:32:57 -0500 Subject: [petsc-users] Guidelines for Nested fieldsplit for domains with a hybrid mesh In-Reply-To: References: Message-ID: <8555AB7C-7589-4299-B476-418757D68A12@petsc.dev> From src/ksp/pc/impls/fieldsplit/fieldsplit.c this is how the saddle point is detected and set into the PC if (jac->detect) { IS zerodiags,rest; PetscInt nmin,nmax; ierr = MatGetOwnershipRange(pc->mat,&nmin,&nmax);CHKERRQ(ierr); if (jac->diag_use_amat) { ierr = MatFindZeroDiagonals(pc->mat,&zerodiags);CHKERRQ(ierr); } else { ierr = MatFindZeroDiagonals(pc->pmat,&zerodiags);CHKERRQ(ierr); } ierr = ISComplement(zerodiags,nmin,nmax,&rest);CHKERRQ(ierr); ierr = PCFieldSplitSetIS(pc,"0",rest);CHKERRQ(ierr); ierr = PCFieldSplitSetIS(pc,"1",zerodiags);CHKERRQ(ierr); ierr = ISDestroy(&zerodiags);CHKERRQ(ierr); ierr = ISDestroy(&rest);CHKERRQ(ierr); In addition these two options are set PetscErrorCode PCFieldSplitSetDetectSaddlePoint(PC pc,PetscBool flg) { PC_FieldSplit *jac = (PC_FieldSplit*)pc->data; PetscErrorCode ierr; PetscFunctionBegin; jac->detect = flg; if (jac->detect) { ierr = PCFieldSplitSetType(pc,PC_COMPOSITE_SCHUR);CHKERRQ(ierr); ierr = PCFieldSplitSetSchurPre(pc,PC_FIELDSPLIT_SCHUR_PRE_SELF,NULL);CHKERRQ(ierr); } PetscFunctionReturn(0); } You can use these routines to directly manage the IS yourself in any manner you choice. Good luck Barry > On Oct 18, 2020, at 2:55 PM, Namala, Solomon wrote: > > Hello, > > I am working to solve Stokes problem on a domain that is discretized using two different types of mesh. A part of the mesh uses fem formulation and the rest uses nodal integral method (NIM) formulation (the details of which I will skip). However, the key takeaway is that NIM formulation of stokes uses pressure Poisson formulation instead of the continuity equation while FEM formulation uses the continuity equation. They are coupled at the interface. Right now, I am building a single matrix for the entire domain and solving it using fieldsplit option in a nested fashion. The matrix structure and the unknown vector are shown below. > > My questions are: > > Are there any basic guidelines to solve these kind of problems. > As I have mentioned I am currently using nested fieldsplit. The first split is using indices and the other split is done using detect saddle point option. is there a way to avoid using that option and doing it by combining set of indices or fields. > The matrix structure is > [Au_fem Bp_fem 0 0] > [Cu_fem. 0 0 0] > [0 0 Du_nim Ep_nim] > [0 0 0 Fp_nim] > > the unknown vector is given by > > [ufem pfem unim pnim] > > Let me know if any additional information is needed. > > Thanks, > Solomon. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hecbarcab at gmail.com Tue Oct 20 03:40:09 2020 From: hecbarcab at gmail.com (=?UTF-8?Q?H=C3=A9ctor_Barreiro_Cabrera?=) Date: Tue, 20 Oct 2020 10:40:09 +0200 Subject: [petsc-users] Eisenstat-Walker method with GPU assembled matrices In-Reply-To: References: Message-ID: El jue., 15 oct. 2020 a las 23:32, Barry Smith () escribi?: > > We still have the assumption the AIJ matrix always has a copy on the > GPU. How did you fill up the matrix on the GPU while not having its copy > on the CPU? > > My strategy here was to initialize the structure on the CPU with dummy values to have the corresponding device arrays allocated. Ideally I would have initialized the structure on a kernel as well, since my intention is to keep all data on the GPU (and not hit host memory other than for debugging). But since the topology of my problem remains constant over time, this approach proved to be sufficient. I did not find any problem with my use case so far. One thing I couldn't figure out, though, is how to force PETSc to transfer the data back to host. MatView always displays the dummy values I used for initialization. Is there a function to do this? Thanks for the replies, by the way! I'm quite surprised how responsive the PETSc community is! :) Cheers, H?ctor > Barry > > When we remove this assumption we have to add a bunch more code for CPU > only things to make sure they properly get the data from the GPU. > > > On Oct 15, 2020, at 4:16 AM, H?ctor Barreiro Cabrera > wrote: > > Hello fellow PETSc users, > > Following up my previous email > , > I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA > kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry > Smith and Mark Adams!). However, I am now facing problems when trying to > use this matrix within a SNES solver with the Eisenstat-Walker method > enabled. > > According to PETSc's error log, the preconditioner is failing to invert > the matrix diagonal. Specifically it says that: > [0]PETSC ERROR: Arguments are incompatible > [0]PETSC ERROR: Zero diagonal on row 0 > [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release > --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++ > --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc" > --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single > --with-64-bit-indices=0 --with-single-library=1 --with-endian=little > --with-debugging=0 --with-x=0 --with-windows-graphics=0 > --with-shared-libraries=1 --CUDAOPTFLAGS=-O2 > > The stack trace leads to the diagonal inversion routine: > [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c > [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c > [0]PETSC ERROR: #3 MatSOR() line 4005 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c > [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c > [0]PETSC ERROR: #5 PCPreSolve() line 1549 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c > [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c > [0]PETSC ERROR: #7 KSPSolve() line 889 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c > [0]PETSC ERROR: #9 SNESSolve() line 4567 in > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c > > I am 100% positive that the diagonal does not contain a zero entry, so my > suspicions are either that this operation is not supported on the GPU at > all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I > am missing some setting to make this work on the GPU. Is this correct? > > Thanks! > > Cheers, > H?ctor > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 20 06:36:52 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 20 Oct 2020 07:36:52 -0400 Subject: [petsc-users] Eisenstat-Walker method with GPU assembled matrices In-Reply-To: References: Message-ID: On Tue, Oct 20, 2020 at 4:40 AM H?ctor Barreiro Cabrera wrote: > El jue., 15 oct. 2020 a las 23:32, Barry Smith () > escribi?: > >> >> We still have the assumption the AIJ matrix always has a copy on the >> GPU. How did you fill up the matrix on the GPU while not having its copy >> on the CPU? >> >> My strategy here was to initialize the structure on the CPU with dummy > values to have the corresponding device arrays allocated. Ideally I would > have initialized the structure on a kernel as well, since my intention is > to keep all data on the GPU (and not hit host memory other than for > debugging). But since the topology of my problem remains constant over > time, this approach proved to be sufficient. I did not find any problem > with my use case so far. > > One thing I couldn't figure out, though, is how to force PETSc to transfer > the data back to host. MatView always displays the dummy values I used for > initialization. Is there a function to do this? > Hmm, this should happen automatically, so we have missed something. How do you change the values on the device? Thanks, Matt > Thanks for the replies, by the way! I'm quite surprised how responsive the > PETSc community is! :) > > Cheers, > H?ctor > > >> Barry >> >> When we remove this assumption we have to add a bunch more code for CPU >> only things to make sure they properly get the data from the GPU. >> >> >> On Oct 15, 2020, at 4:16 AM, H?ctor Barreiro Cabrera >> wrote: >> >> Hello fellow PETSc users, >> >> Following up my previous email >> , >> I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA >> kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry >> Smith and Mark Adams!). However, I am now facing problems when trying to >> use this matrix within a SNES solver with the Eisenstat-Walker method >> enabled. >> >> According to PETSc's error log, the preconditioner is failing to invert >> the matrix diagonal. Specifically it says that: >> [0]PETSC ERROR: Arguments are incompatible >> [0]PETSC ERROR: Zero diagonal on row 0 >> [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release >> --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++ >> --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc" >> --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single >> --with-64-bit-indices=0 --with-single-library=1 --with-endian=little >> --with-debugging=0 --with-x=0 --with-windows-graphics=0 >> --with-shared-libraries=1 --CUDAOPTFLAGS=-O2 >> >> The stack trace leads to the diagonal inversion routine: >> [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c >> [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c >> [0]PETSC ERROR: #3 MatSOR() line 4005 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c >> [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c >> [0]PETSC ERROR: #5 PCPreSolve() line 1549 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c >> [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c >> [0]PETSC ERROR: #7 KSPSolve() line 889 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c >> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c >> [0]PETSC ERROR: #9 SNESSolve() line 4567 in >> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c >> >> I am 100% positive that the diagonal does not contain a zero entry, so my >> suspicions are either that this operation is not supported on the GPU at >> all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I >> am missing some setting to make this work on the GPU. Is this correct? >> >> Thanks! >> >> Cheers, >> H?ctor >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Oct 20 06:52:10 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 20 Oct 2020 14:52:10 +0300 Subject: [petsc-users] Eisenstat-Walker method with GPU assembled matrices In-Reply-To: References: Message-ID: We currently do not have a transfer to host setup for cusparse. I have a preliminary version here https://gitlab.com/petsc/petsc/-/tree/stefanozampini/feature-mataij-create-fromcoo Should be ready in a couple of days for review. Il giorno mar 20 ott 2020 alle ore 14:37 Matthew Knepley ha scritto: > On Tue, Oct 20, 2020 at 4:40 AM H?ctor Barreiro Cabrera < > hecbarcab at gmail.com> wrote: > >> El jue., 15 oct. 2020 a las 23:32, Barry Smith () >> escribi?: >> >>> >>> We still have the assumption the AIJ matrix always has a copy on the >>> GPU. How did you fill up the matrix on the GPU while not having its copy >>> on the CPU? >>> >>> My strategy here was to initialize the structure on the CPU with dummy >> values to have the corresponding device arrays allocated. Ideally I would >> have initialized the structure on a kernel as well, since my intention is >> to keep all data on the GPU (and not hit host memory other than for >> debugging). But since the topology of my problem remains constant over >> time, this approach proved to be sufficient. I did not find any problem >> with my use case so far. >> >> One thing I couldn't figure out, though, is how to force PETSc to >> transfer the data back to host. MatView always displays the dummy values I >> used for initialization. Is there a function to do this? >> > > Hmm, this should happen automatically, so we have missed something. How do > you change the values on the device? > > Thanks, > > Matt > > >> Thanks for the replies, by the way! I'm quite surprised how responsive >> the PETSc community is! :) >> >> Cheers, >> H?ctor >> >> >>> Barry >>> >>> When we remove this assumption we have to add a bunch more code for >>> CPU only things to make sure they properly get the data from the GPU. >>> >>> >>> On Oct 15, 2020, at 4:16 AM, H?ctor Barreiro Cabrera < >>> hecbarcab at gmail.com> wrote: >>> >>> Hello fellow PETSc users, >>> >>> Following up my previous email >>> , >>> I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA >>> kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry >>> Smith and Mark Adams!). However, I am now facing problems when trying to >>> use this matrix within a SNES solver with the Eisenstat-Walker method >>> enabled. >>> >>> According to PETSc's error log, the preconditioner is failing to invert >>> the matrix diagonal. Specifically it says that: >>> [0]PETSC ERROR: Arguments are incompatible >>> [0]PETSC ERROR: Zero diagonal on row 0 >>> [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release >>> --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++ >>> --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc" >>> --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single >>> --with-64-bit-indices=0 --with-single-library=1 --with-endian=little >>> --with-debugging=0 --with-x=0 --with-windows-graphics=0 >>> --with-shared-libraries=1 --CUDAOPTFLAGS=-O2 >>> >>> The stack trace leads to the diagonal inversion routine: >>> [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c >>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c >>> [0]PETSC ERROR: #3 MatSOR() line 4005 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c >>> [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c >>> [0]PETSC ERROR: #5 PCPreSolve() line 1549 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c >>> [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c >>> [0]PETSC ERROR: #7 KSPSolve() line 889 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c >>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c >>> [0]PETSC ERROR: #9 SNESSolve() line 4567 in >>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c >>> >>> I am 100% positive that the diagonal does not contain a zero entry, so >>> my suspicions are either that this operation is not supported on the GPU at >>> all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I >>> am missing some setting to make this work on the GPU. Is this correct? >>> >>> Thanks! >>> >>> Cheers, >>> H?ctor >>> >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.huysegoms at fz-juelich.de Thu Oct 22 03:24:34 2020 From: m.huysegoms at fz-juelich.de (Marcel Huysegoms) Date: Thu, 22 Oct 2020 10:24:34 +0200 Subject: [petsc-users] MatOrdering for rectangular matrix Message-ID: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de> Hi all, I'm currently implementing a Gauss-Newton approach for minimizing a non-linear cost function using PETSc4py. The (rectangular) linear systems I am trying to solve have dimensions of about (5N, N), where N is in the range of several hundred millions. Due to its size and because it's an over-determined system, I use LSQR in conjunction with a preconditioner (which operates on A^T x A, e.g. BJacobi). Depending on the ordering of the unknowns the algorithm only converges for special cases. When I use a direct LR solver (as preconditioner) it consistently converges, but consumes too much memory. I have read in the manual that the LR solver internally also applies a matrix reordering beforehand. My question would be: How can I improve the ordering of the unknowns for a rectangular matrix (in order to converge also with iterative preconditioners)? If I use MatGetOrdering(), it only works for square matrices. Is there a way to achieve this from within PETSc4py? ParMETIS seems to be a promising framework for that task. Is it possible to apply its reordering algorithm to a rectangular PETSc-matrix? I would be thankful for every bit of advice that might help. Best regards, Marcel ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ From knepley at gmail.com Thu Oct 22 04:55:15 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 22 Oct 2020 05:55:15 -0400 Subject: [petsc-users] MatOrdering for rectangular matrix In-Reply-To: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de> References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de> Message-ID: On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms wrote: > Hi all, > > I'm currently implementing a Gauss-Newton approach for minimizing a > non-linear cost function using PETSc4py. > The (rectangular) linear systems I am trying to solve have dimensions of > about (5N, N), where N is in the range of several hundred millions. > > Due to its size and because it's an over-determined system, I use LSQR > in conjunction with a preconditioner (which operates on A^T x A, e.g. > BJacobi). > Depending on the ordering of the unknowns the algorithm only converges > for special cases. When I use a direct LR solver (as preconditioner) it > consistently converges, but consumes too much memory. I have read in the > manual that the LR solver internally also applies a matrix reordering > beforehand. > > My question would be: > How can I improve the ordering of the unknowns for a rectangular matrix > (in order to converge also with iterative preconditioners)? If I use > MatGetOrdering(), it only works for square matrices. Is there a way to > achieve this from within PETSc4py? > ParMETIS seems to be a promising framework for that task. Is it possible > to apply its reordering algorithm to a rectangular PETSc-matrix? > > I would be thankful for every bit of advice that might help. > We do not have any rectangular reordering algorithms. I think your first step is to find something in the literature that you think will work. Thanks, Matt > Best regards, > Marcel > > > > ------------------------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------------------------ > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Volker Rieke > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt > > ------------------------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------------------------ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Oct 22 09:34:22 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Oct 2020 09:34:22 -0500 Subject: [petsc-users] MatOrdering for rectangular matrix In-Reply-To: References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de> Message-ID: <9B822030-7E72-4C6A-9669-6AA82AFB0B95@petsc.dev> Marcel, Would you like to do the following? Compute Q A P where Q is a row permutation, P a column permutation and then apply LSQR on QAP? From the manual page: In exact arithmetic the LSQR method (with no preconditioning) is identical to the KSPCG algorithm applied to the normal equations. [Q A P]' [Q A P] = P' A' A P = P'(A'A) P the Q drops out because permutation matrices' transposes are their inverse Note that P is a small square matrix. So my conclusion is that any column permutation of A is also a symmetric permutation of A'A so you can just try using regular reorderings of A'A if you want to "concentrate" the "important" parts of A'A into your "block diagonal" preconditioner (and throw away the other parts) I don't know what it will do to the convergence. I've never had much luck generically trying to symmetrically reorder matrices to improve preconditioners but for certain situation maybe it might help. For example if the matrix is [0 1; 1 0] and you permute it you get the [1 0; 0 1] which looks better. There is this https://epubs.siam.org/doi/10.1137/S1064827599361308 but it is for non-symmetric permutations and in your case if you use a non symmetric permeation you can no longer use LSQR. Barry > On Oct 22, 2020, at 4:55 AM, Matthew Knepley wrote: > > On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms > wrote: > Hi all, > > I'm currently implementing a Gauss-Newton approach for minimizing a > non-linear cost function using PETSc4py. > The (rectangular) linear systems I am trying to solve have dimensions of > about (5N, N), where N is in the range of several hundred millions. > > Due to its size and because it's an over-determined system, I use LSQR > in conjunction with a preconditioner (which operates on A^T x A, e.g. > BJacobi). > Depending on the ordering of the unknowns the algorithm only converges > for special cases. When I use a direct LR solver (as preconditioner) it > consistently converges, but consumes too much memory. I have read in the > manual that the LR solver internally also applies a matrix reordering > beforehand. > > My question would be: > How can I improve the ordering of the unknowns for a rectangular matrix > (in order to converge also with iterative preconditioners)? If I use > MatGetOrdering(), it only works for square matrices. Is there a way to > achieve this from within PETSc4py? > ParMETIS seems to be a promising framework for that task. Is it possible > to apply its reordering algorithm to a rectangular PETSc-matrix? > > I would be thankful for every bit of advice that might help. > > We do not have any rectangular reordering algorithms. I think your first step is to > find something in the literature that you think will work. > > Thanks, > > Matt > > Best regards, > Marcel > > > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Volker Rieke > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.huysegoms at fz-juelich.de Thu Oct 22 09:38:58 2020 From: m.huysegoms at fz-juelich.de (Marcel Huysegoms) Date: Thu, 22 Oct 2020 16:38:58 +0200 Subject: [petsc-users] MatOrdering for rectangular matrix In-Reply-To: References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de> Message-ID: <7eda6f8e-6dc7-5953-80bc-603c4c661169@fz-juelich.de> Hi Matt, thanks for your response! I haven't studied the recent literature on reordering algorithms, but came across a talk by Tim Davis, the developer of SuiteSparse, from 2013: https://www.youtube.com/watch?v=7ph4ZQ9oEIc&t=2109s At minute 33:40 he shows the impact of different reordering libraries applied to a large least square system. In doing so, he demonstrates how he achieves a significant speedup when using the matrix reordering algorithm of METIS/ParMETIS (which is a multilevel nested dissection). So it seems that METIS is able to compute an effective column reordering of rectangular matrices for fill-reducing factorizations. The respective slide of the talk is also available as a screenshot under: https://www.mathworks.com/matlabcentral/answers/uploaded_files/173888/image.png (extracted from a forum post on a similar topic: https://de.mathworks.com/matlabcentral/answers/275622-large-sparse-rectangular-over-determined-equation-system-to-reorder-or-to-not-reorder) Considering that PETSc is offering a wrapper to the partitioning functionalities of ParMETIS, I am wondering, if it might be reasonable in the near future to also provide an option to use the reordering functionality of METIS (METIS_NodeND/ParMETIS_V3_NodeND) from within PETSc? That would be incredible and may be useful to many applications. I've just seen that MatGetOrdering() even provides an option for external libraries (MATORDERINGEXTERNAL). Is it maybe already possible to use the function in conjuction with ParMETIS? Best regards, Marcel Am 22.10.20 um 11:55 schrieb Matthew Knepley: > On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms > > wrote: > > Hi all, > > I'm currently implementing a Gauss-Newton approach for minimizing a > non-linear cost function using PETSc4py. > The (rectangular) linear systems I am trying to solve have > dimensions of > about (5N, N), where N is in the range of several hundred millions. > > Due to its size and because it's an over-determined system, I use LSQR > in conjunction with a preconditioner (which operates on A^T x A, e.g. > BJacobi). > Depending on the ordering of the unknowns the algorithm only converges > for special cases. When I use a direct LR solver (as > preconditioner) it > consistently converges, but consumes too much memory. I have read > in the > manual that the LR solver internally also applies a matrix reordering > beforehand. > > My question would be: > How can I improve the ordering of the unknowns for a rectangular > matrix > (in order to converge also with iterative preconditioners)? If I use > MatGetOrdering(), it only works for square matrices. Is there a way to > achieve this from within PETSc4py? > ParMETIS seems to be a promising framework for that task. Is it > possible > to apply its reordering algorithm to a rectangular PETSc-matrix? > > I would be thankful for every bit of advice that might help. > > > We do not have any rectangular reordering algorithms. I think your > first step is to > find something in the literature that you think will work. > > ? Thanks, > > ? ? ?Matt > > Best regards, > Marcel > > > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Volker Rieke > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Oct 22 11:45:35 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Oct 2020 11:45:35 -0500 Subject: [petsc-users] MatOrdering for rectangular matrix In-Reply-To: <7eda6f8e-6dc7-5953-80bc-603c4c661169@fz-juelich.de> References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de> <7eda6f8e-6dc7-5953-80bc-603c4c661169@fz-juelich.de> Message-ID: Marcel, He also has SuiteSparseQR AMD, so with your interpretation this means AMD can also reorder a rectangular matrix? I think you need to dig into SuiteSparseQR or his papers to find out what he is actually reordering; I suspect it is not actually the rectangular matrix. Barry > On Oct 22, 2020, at 9:38 AM, Marcel Huysegoms wrote: > > Hi Matt, > > thanks for your response! > I haven't studied the recent literature on reordering algorithms, but came across a talk by Tim Davis, the developer of SuiteSparse, from 2013: > > https://www.youtube.com/watch?v=7ph4ZQ9oEIc&t=2109s > > At minute 33:40 he shows the impact of different reordering libraries applied to a large least square system. > In doing so, he demonstrates how he achieves a significant speedup when using the matrix reordering algorithm of METIS/ParMETIS (which is a multilevel nested dissection). So it seems that METIS is able to compute an effective column reordering of rectangular matrices for fill-reducing factorizations. The respective slide of the talk is also available as a screenshot under: > > https://www.mathworks.com/matlabcentral/answers/uploaded_files/173888/image.png > > (extracted from a forum post on a similar topic: https://de.mathworks.com/matlabcentral/answers/275622-large-sparse-rectangular-over-determined-equation-system-to-reorder-or-to-not-reorder ) > > Considering that PETSc is offering a wrapper to the partitioning functionalities of ParMETIS, I am wondering, if it might be reasonable in the near future to also provide an option to use the reordering functionality of METIS (METIS_NodeND/ParMETIS_V3_NodeND) from within PETSc? That would be incredible and may be useful to many applications. I've just seen that MatGetOrdering() even provides an option for external libraries (MATORDERINGEXTERNAL). Is it maybe already possible to use the function in conjuction with ParMETIS? > > Best regards, > Marcel > > > Am 22.10.20 um 11:55 schrieb Matthew Knepley: >> On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms > wrote: >> Hi all, >> >> I'm currently implementing a Gauss-Newton approach for minimizing a >> non-linear cost function using PETSc4py. >> The (rectangular) linear systems I am trying to solve have dimensions of >> about (5N, N), where N is in the range of several hundred millions. >> >> Due to its size and because it's an over-determined system, I use LSQR >> in conjunction with a preconditioner (which operates on A^T x A, e.g. >> BJacobi). >> Depending on the ordering of the unknowns the algorithm only converges >> for special cases. When I use a direct LR solver (as preconditioner) it >> consistently converges, but consumes too much memory. I have read in the >> manual that the LR solver internally also applies a matrix reordering >> beforehand. >> >> My question would be: >> How can I improve the ordering of the unknowns for a rectangular matrix >> (in order to converge also with iterative preconditioners)? If I use >> MatGetOrdering(), it only works for square matrices. Is there a way to >> achieve this from within PETSc4py? >> ParMETIS seems to be a promising framework for that task. Is it possible >> to apply its reordering algorithm to a rectangular PETSc-matrix? >> >> I would be thankful for every bit of advice that might help. >> >> We do not have any rectangular reordering algorithms. I think your first step is to >> find something in the literature that you think will work. >> >> Thanks, >> >> Matt >> >> Best regards, >> Marcel >> >> >> ------------------------------------------------------------------------------------------------ >> ------------------------------------------------------------------------------------------------ >> Forschungszentrum Juelich GmbH >> 52425 Juelich >> Sitz der Gesellschaft: Juelich >> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke >> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), >> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt >> ------------------------------------------------------------------------------------------------ >> ------------------------------------------------------------------------------------------------ >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Thu Oct 22 14:12:51 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Thu, 22 Oct 2020 12:12:51 -0700 Subject: [petsc-users] FPE when trying to find the condition number Message-ID: Hello, I am trying to find the condition number of the A matrix for a linear system I am solving. I have used the following commands. *./a.out -ksp_monitor_singular_value -ksp_type gmres -ksp_gmres_restart 1000 -pc_type none*However, the execution comes to a halt after a few iterations with the following error. [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] LAPACKgesvd line 40 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 22 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 59 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: [0] KSPMonitorSingularValue line 130 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/iterativ.c [0]PETSC ERROR: [0] KSPMonitor line 1765 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: [0] KSPGMRESCycle line 122 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: [0] KSPSolve_GMRES line 225 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: [0] KSPSolve line 678 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-9.agave.rc.asu.edu by pbaikadi Thu Oct 22 12:07:11 2020 [0]PETSC ERROR: Configure options [0]PETSC ERROR: #1 User provided function() line 0 in unknown file -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- Is the error because the A matrix is singular (causing the max/min to be undefined)? Please let me know. Thank you, Sincerely, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Antoine.Cote3 at USherbrooke.ca Thu Oct 22 14:23:12 2020 From: Antoine.Cote3 at USherbrooke.ca (=?iso-8859-1?Q?Antoine_C=F4t=E9?=) Date: Thu, 22 Oct 2020 19:23:12 +0000 Subject: [petsc-users] Enhancing MatScale computing time Message-ID: Hi, I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) : MatZeroEntries(K) In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) MatDiagonalScale(K, vec1, vec1) MatDiagonalSet(K, vec2, ADD_VALUES) Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks : * DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4 * Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual * Tried MatSetOption() * MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency * MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" * MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" * MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly" According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue. I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster? Thank you very much! Antoine C?t? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 22 14:23:45 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 22 Oct 2020 15:23:45 -0400 Subject: [petsc-users] FPE when trying to find the condition number In-Reply-To: References: Message-ID: On Thu, Oct 22, 2020 at 3:13 PM baikadi pranay wrote: > Hello, > > I am trying to find the condition number of the A matrix for a linear > system I am solving. I have used the following commands. > > *./a.out -ksp_monitor_singular_value -ksp_type gmres -ksp_gmres_restart > 1000 -pc_type none*However, the execution comes to a halt after a few > iterations with the following error. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point > Exception,probably divide by zero > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] LAPACKgesvd line 40 > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c > [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 22 > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c > [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 59 > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] KSPMonitorSingularValue line 130 > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/iterativ.c > [0]PETSC ERROR: [0] KSPMonitor line 1765 > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] KSPGMRESCycle line 122 > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve_GMRES line 225 > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve line 678 > /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named > cg17-9.agave.rc.asu.edu by pbaikadi Thu Oct 22 12:07:11 2020 > [0]PETSC ERROR: Configure options > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 59. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > Is the error because the A matrix is singular (causing the max/min to be > undefined)? Please let me know. > No. It is more likely that there is an invalid value, like an Inf or NaN. Thanks, Matt > Thank you, > Sincerely, > Pranay. > ? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 22 14:35:59 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 22 Oct 2020 15:35:59 -0400 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: References: Message-ID: On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? wrote: > Hi, > > I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse > matrix Mat K. The Mat is modified repeatedly by the program, using the > commands (in that order) : > > MatZeroEntries(K) > In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) > MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) > MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) > MatDiagonalScale(K, vec1, vec1) > MatDiagonalSet(K, vec2, ADD_VALUES) > > Computing time seems high and I would like to improve it. Running tests > with "-log_view" tells me that MatScale() is the bottle neck (50% of total > computing time) . From manual pages, I've tried a few tweaks : > > - DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees > of freedom per node, ... BAIJ can significantly enhance performance", > Chapter 14.2.4 > - Used MatMissingDiagonal() to confirm there is no missing diagonal > entries : "If the matrix Y is missing some diagonal entries this routine > can be very slow", MatDiagonalSet() manual > - Tried MatSetOption() > - MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly > efficiency > - MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly > processes have one less global reduction" > - MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, > assembly processes have one less global reduction" > - MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during > matrix assembly" > > According to "-log_view", assembly is fast (0% of total time), and the > use of a DMDA makes me believe preallocation isn't the cause of performance > issue. > > I would like to know how could I improve MatScale(). What are the best > practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? > Instead of MatDiagonalScale(), should I use another command to obtain the > same result faster? > Something is definitely strange. Can you please send the output of -log_view -info :mat Thanks, Matt > Thank you very much! > > Antoine C?t? > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Antoine.Cote3 at USherbrooke.ca Thu Oct 22 15:02:00 2020 From: Antoine.Cote3 at USherbrooke.ca (=?iso-8859-1?Q?Antoine_C=F4t=E9?=) Date: Thu, 22 Oct 2020 20:02:00 +0000 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: References: , Message-ID: Hi, See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof. Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used. Regards, Antoine ________________________________ De : Matthew Knepley Envoy? : 22 octobre 2020 15:35 ? : Antoine C?t? Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Enhancing MatScale computing time On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? > wrote: Hi, I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) : MatZeroEntries(K) In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) MatDiagonalScale(K, vec1, vec1) MatDiagonalSet(K, vec2, ADD_VALUES) Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks : * DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4 * Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual * Tried MatSetOption() * MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency * MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" * MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" * MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly" According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue. I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster? Something is definitely strange. Can you please send the output of -log_view -info :mat Thanks, Matt Thank you very much! Antoine C?t? -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LogView.out Type: application/octet-stream Size: 14200 bytes Desc: LogView.out URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mat.0 Type: application/octet-stream Size: 234549 bytes Desc: mat.0 URL: From bsmith at petsc.dev Thu Oct 22 15:09:33 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Oct 2020 15:09:33 -0500 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: References: Message-ID: <8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev> MatMult 9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 23 48 0 0 0 61 91 0 0 0 1079 MatScale 6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 467 Though the flop rate of MatScale is not so high (467) it is taking very little (0 percent of the run time while MatMult takes 23 percent of the time). So the main cost related to the matrices is MatMult because it has a lot of operations 9553, you might think about your algorithms you are using and if there improvements. It looks like you are using some kind of multigrid and solve 6 problems with 1357 total iterations which is 200 iterations per solve. This is absolutely HUGE for multigrain, you need to tune the multigrid for you problem to bring that down to at most a couple dozen iterations per solve. Barry > On Oct 22, 2020, at 3:02 PM, Antoine C?t? wrote: > > Hi, > > See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof. > > Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used. > > Regards, > > Antoine > De : Matthew Knepley > > Envoy? : 22 octobre 2020 15:35 > ? : Antoine C?t? > > Cc : petsc-users at mcs.anl.gov > > Objet : Re: [petsc-users] Enhancing MatScale computing time > > On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? > wrote: > Hi, > > I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) : > > MatZeroEntries(K) > In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) > MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) > MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) > MatDiagonalScale(K, vec1, vec1) > MatDiagonalSet(K, vec2, ADD_VALUES) > > Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks : > DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4 > Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual > Tried MatSetOption() > MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency > MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" > MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" > MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly" > According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue. > > I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster? > > Something is definitely strange. Can you please send the output of > > -log_view -info :mat > > Thanks, > > Matt > > Thank you very much! > > Antoine C?t? > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Oct 22 15:13:54 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Oct 2020 15:13:54 -0500 Subject: [petsc-users] FPE when trying to find the condition number In-Reply-To: References: Message-ID: <697184D9-7A67-4621-8120-CA78A0A5C18F@petsc.dev> The reference implementation of LAPACK tries a divide by zero in its setup to see if it can divide by zero and that is happening for you. Hence the PETSc code has ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); #if !defined(PETSC_USE_COMPLEX) PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,&lierr)); #else PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,realpart+N,&lierr)); #endif if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD Lapack routine %d",(int)lierr); ierr = PetscFPTrapPop();CHKERRQ(ierr); which is suppose to turn off the trapping. The code that turns off the trapping is OS dependent, perhaps it does not work for you. There is a bit better code in the current release than 3.11 I recommend you first upgrade. What system are you running on? Barry > On Oct 22, 2020, at 2:12 PM, baikadi pranay wrote: > > Hello, > > I am trying to find the condition number of the A matrix for a linear system I am solving. I have used the following commands. > ./a.out -ksp_monitor_singular_value -ksp_type gmres -ksp_gmres_restart 1000 -pc_type none > However, the execution comes to a halt after a few iterations with the following error. > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] LAPACKgesvd line 40 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c > [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 22 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c > [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 59 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] KSPMonitorSingularValue line 130 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/iterativ.c > [0]PETSC ERROR: [0] KSPMonitor line 1765 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: [0] KSPGMRESCycle line 122 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve_GMRES line 225 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve line 678 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-9.agave.rc.asu.edu by pbaikadi Thu Oct 22 12:07:11 2020 > [0]PETSC ERROR: Configure options > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 59. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > Is the error because the A matrix is singular (causing the max/min to be undefined)? Please let me know. > > Thank you, > Sincerely, > Pranay. > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Antoine.Cote3 at USherbrooke.ca Thu Oct 22 15:17:18 2020 From: Antoine.Cote3 at USherbrooke.ca (=?iso-8859-1?Q?Antoine_C=F4t=E9?=) Date: Thu, 22 Oct 2020 20:17:18 +0000 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: <8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev> References: , <8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev> Message-ID: Hi Sir, MatScale in "Main Stage" is indeed called 6 times for 0% run time. In stage "Stiff_Adj" though, we get : MatScale 8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 50 46 0 0 0 80 98 0 0 0 482 MatMult is indeed expensive (23% run time) and should be improved, but MatScale in "Stiff_Adj" is still taking 50% run time Thanks, Antoine ________________________________ De : Barry Smith Envoy? : 22 octobre 2020 16:09 ? : Antoine C?t? Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Enhancing MatScale computing time MatMult 9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 23 48 0 0 0 61 91 0 0 0 1079 MatScale 6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 467 Though the flop rate of MatScale is not so high (467) it is taking very little (0 percent of the run time while MatMult takes 23 percent of the time). So the main cost related to the matrices is MatMult because it has a lot of operations 9553, you might think about your algorithms you are using and if there improvements. It looks like you are using some kind of multigrid and solve 6 problems with 1357 total iterations which is 200 iterations per solve. This is absolutely HUGE for multigrain, you need to tune the multigrid for you problem to bring that down to at most a couple dozen iterations per solve. Barry On Oct 22, 2020, at 3:02 PM, Antoine C?t? > wrote: Hi, See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof. Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used. Regards, Antoine ________________________________ De : Matthew Knepley > Envoy? : 22 octobre 2020 15:35 ? : Antoine C?t? > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] Enhancing MatScale computing time On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? > wrote: Hi, I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) : MatZeroEntries(K) In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) MatDiagonalScale(K, vec1, vec1) MatDiagonalSet(K, vec2, ADD_VALUES) Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks : * DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4 * Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual * Tried MatSetOption() * MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency * MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" * MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" * MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly" According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue. I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster? Something is definitely strange. Can you please send the output of -log_view -info :mat Thanks, Matt Thank you very much! Antoine C?t? -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Oct 22 15:28:28 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 22 Oct 2020 16:28:28 -0400 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: References: <8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev> Message-ID: On Thu, Oct 22, 2020 at 4:17 PM Antoine C?t? wrote: > Hi Sir, > > MatScale in "Main Stage" is indeed called 6 times for 0% run time. In > stage "Stiff_Adj" though, we get : > > MatScale 8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 50 46 0 0 0 80 98 0 0 0 482 > > MatMult is indeed expensive (23% run time) and should be improved, but > MatScale in "Stiff_Adj" is still taking 50% run time > I was a little surprised that MatScale gets only 450 MFlops. However, it looks like you are running the debugging version of PETSc. Could you configure a version without debugging: $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/reconfigure-$PETSC_ARCH.py --with-debugging=0 --PETSC_ARCH=arch-master-opt and rerun the timings? Thanks, Matt > Thanks, > > Antoine > ------------------------------ > *De :* Barry Smith > *Envoy? :* 22 octobre 2020 16:09 > *? :* Antoine C?t? > *Cc :* petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] Enhancing MatScale computing time > > > MatMult 9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 23 48 0 0 0 61 91 0 0 0 1079 > MatScale 6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 467 > > Though the flop rate of MatScale is not so high (467) it is taking very > little (0 percent of the run time while MatMult takes 23 percent of the > time). > > So the main cost related to the matrices is MatMult because it has a lot > of operations 9553, you might think about your algorithms you are using and > if there > improvements. > > It looks like you are using some kind of multigrid and solve 6 problems > with 1357 total iterations which is 200 iterations per solve. This is > absolutely HUGE for multigrain, you need to tune the multigrid for you > problem to bring that down to at most a couple dozen iterations per solve. > > Barry > > On Oct 22, 2020, at 3:02 PM, Antoine C?t? > wrote: > > Hi, > > See attached files for both outputs. Tell me if you need any > clarification. It was run with a DMDA of 33x17x17 nodes (creating > 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 > dof. > > Note : Stage "Stiff_Adj" is the part of the code modifying Mat > K. PetscLogStagePush/Pop was used. > > Regards, > > Antoine > ------------------------------ > *De :* Matthew Knepley > *Envoy? :* 22 octobre 2020 15:35 > *? :* Antoine C?t? > *Cc :* petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] Enhancing MatScale computing time > > On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? > wrote: > > Hi, > > I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse > matrix Mat K. The Mat is modified repeatedly by the program, using the > commands (in that order) : > > MatZeroEntries(K) > In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) > MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) > MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) > MatDiagonalScale(K, vec1, vec1) > MatDiagonalSet(K, vec2, ADD_VALUES) > > Computing time seems high and I would like to improve it. Running tests > with "-log_view" tells me that MatScale() is the bottle neck (50% of total > computing time) . From manual pages, I've tried a few tweaks : > > - DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees > of freedom per node, ... BAIJ can significantly enhance performance", > Chapter 14.2.4 > - Used MatMissingDiagonal() to confirm there is no missing diagonal > entries : "If the matrix Y is missing some diagonal entries this routine > can be very slow", MatDiagonalSet() manual > - Tried MatSetOption() > - MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly > efficiency > - MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly > processes have one less global reduction" > - MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, > assembly processes have one less global reduction" > - MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during > matrix assembly" > > According to "-log_view", assembly is fast (0% of total time), and the > use of a DMDA makes me believe preallocation isn't the cause of performance > issue. > > I would like to know how could I improve MatScale(). What are the best > practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? > Instead of MatDiagonalScale(), should I use another command to obtain the > same result faster? > > > Something is definitely strange. Can you please send the output of > > -log_view -info :mat > > Thanks, > > Matt > > > Thank you very much! > > Antoine C?t? > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Oct 22 16:03:11 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Oct 2020 16:03:11 -0500 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: References: Message-ID: <857A52B0-B6C5-4E4F-B85C-3F69C044E477@petsc.dev> Yes, you are correct I missed that part of the run As you can see below MatScale calls only BLAS dscal() there is really no way to make that go faster. How big is the matrix. What are you doing with the matrix after you scale it? The only way to improve the time is to find some way to scale it less often. It is curious that VecScale has a much higher flop rate when it has the same code see below. Unless the matrices are tiny I would expect similar flop rates. Barry PetscErrorCode MatScale_SeqAIJ(Mat inA,PetscScalar alpha) { Mat_SeqAIJ *a = (Mat_SeqAIJ*)inA->data; PetscScalar oalpha = alpha; PetscErrorCode ierr; PetscBLASInt one = 1,bnz; PetscFunctionBegin; ierr = PetscBLASIntCast(a->nz,&bnz);CHKERRQ(ierr); PetscStackCallBLAS("BLASscal",BLASscal_(&bnz,&oalpha,a->a,&one)); ierr = PetscLogFlops(a->nz);CHKERRQ(ierr); ierr = MatSeqAIJInvalidateDiagonal(inA);CHKERRQ(ierr); #if defined(PETSC_HAVE_DEVICE) if (inA->offloadmask != PETSC_OFFLOAD_UNALLOCATED) inA->offloadmask = PETSC_OFFLOAD_CPU; #endif PetscFunctionReturn(0); } PetscErrorCode VecScale_Seq(Vec xin, PetscScalar alpha) { PetscErrorCode ierr; PetscBLASInt one = 1,bn; PetscFunctionBegin; ierr = PetscBLASIntCast(xin->map->n,&bn);CHKERRQ(ierr); if (alpha == (PetscScalar)0.0) { ierr = VecSet_Seq(xin,alpha);CHKERRQ(ierr); } else if (alpha != (PetscScalar)1.0) { PetscScalar a = alpha,*xarray; ierr = VecGetArray(xin,&xarray);CHKERRQ(ierr); PetscStackCallBLAS("BLASscal",BLASscal_(&bn,&a,xarray,&one)); ierr = VecRestoreArray(xin,&xarray);CHKERRQ(ierr); } ierr = PetscLogFlops(xin->map->n);CHKERRQ(ierr); PetscFunctionReturn(0); } > On Oct 22, 2020, at 3:02 PM, Antoine C?t? wrote: > > Hi, > > See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof. > > Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used. > > Regards, > > Antoine > De : Matthew Knepley > > Envoy? : 22 octobre 2020 15:35 > ? : Antoine C?t? > > Cc : petsc-users at mcs.anl.gov > > Objet : Re: [petsc-users] Enhancing MatScale computing time > > On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? > wrote: > Hi, > > I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) : > > MatZeroEntries(K) > In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) > MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) > MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) > MatDiagonalScale(K, vec1, vec1) > MatDiagonalSet(K, vec2, ADD_VALUES) > > Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks : > DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4 > Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual > Tried MatSetOption() > MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency > MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" > MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" > MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly" > According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue. > > I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster? > > Something is definitely strange. Can you please send the output of > > -log_view -info :mat > > Thanks, > > Matt > > Thank you very much! > > Antoine C?t? > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Antoine.Cote3 at USherbrooke.ca Thu Oct 22 16:08:22 2020 From: Antoine.Cote3 at USherbrooke.ca (=?iso-8859-1?Q?Antoine_C=F4t=E9?=) Date: Thu, 22 Oct 2020 21:08:22 +0000 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: References: <8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev> , Message-ID: The new outputs are attached. The same problem was run with arch-master-opt The overall time was cut in half, but %T remain roughly the same... Thanks, Antoine ________________________________ De : Matthew Knepley Envoy? : 22 octobre 2020 16:28 ? : Antoine C?t? Cc : Barry Smith ; petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Enhancing MatScale computing time On Thu, Oct 22, 2020 at 4:17 PM Antoine C?t? > wrote: Hi Sir, MatScale in "Main Stage" is indeed called 6 times for 0% run time. In stage "Stiff_Adj" though, we get : MatScale 8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 50 46 0 0 0 80 98 0 0 0 482 MatMult is indeed expensive (23% run time) and should be improved, but MatScale in "Stiff_Adj" is still taking 50% run time I was a little surprised that MatScale gets only 450 MFlops. However, it looks like you are running the debugging version of PETSc. Could you configure a version without debugging: $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/reconfigure-$PETSC_ARCH.py --with-debugging=0 --PETSC_ARCH=arch-master-opt and rerun the timings? Thanks, Matt Thanks, Antoine ________________________________ De : Barry Smith > Envoy? : 22 octobre 2020 16:09 ? : Antoine C?t? Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] Enhancing MatScale computing time MatMult 9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 23 48 0 0 0 61 91 0 0 0 1079 MatScale 6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 467 Though the flop rate of MatScale is not so high (467) it is taking very little (0 percent of the run time while MatMult takes 23 percent of the time). So the main cost related to the matrices is MatMult because it has a lot of operations 9553, you might think about your algorithms you are using and if there improvements. It looks like you are using some kind of multigrid and solve 6 problems with 1357 total iterations which is 200 iterations per solve. This is absolutely HUGE for multigrain, you need to tune the multigrid for you problem to bring that down to at most a couple dozen iterations per solve. Barry On Oct 22, 2020, at 3:02 PM, Antoine C?t? > wrote: Hi, See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof. Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used. Regards, Antoine ________________________________ De : Matthew Knepley > Envoy? : 22 octobre 2020 15:35 ? : Antoine C?t? > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] Enhancing MatScale computing time On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? > wrote: Hi, I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) : MatZeroEntries(K) In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) MatDiagonalScale(K, vec1, vec1) MatDiagonalSet(K, vec2, ADD_VALUES) Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks : * DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4 * Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual * Tried MatSetOption() * MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency * MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" * MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" * MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly" According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue. I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster? Something is definitely strange. Can you please send the output of -log_view -info :mat Thanks, Matt Thank you very much! Antoine C?t? -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LogView.out Type: application/octet-stream Size: 12210 bytes Desc: LogView.out URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mat.0 Type: application/octet-stream Size: 234549 bytes Desc: mat.0 URL: From bsmith at petsc.dev Thu Oct 22 16:42:29 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 22 Oct 2020 16:42:29 -0500 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: References: <8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev> Message-ID: <173AA74E-1BCB-4D19-AC25-F0B5677F044C@petsc.dev> Please apply the attached patch (you just need to run make libs in the petsc director after you apply it) and see if the VecScale flop rate changes. Barry > On Oct 22, 2020, at 4:08 PM, Antoine C?t? wrote: > > The new outputs are attached. The same problem was run with arch-master-opt > > The overall time was cut in half, but %T remain roughly the same... > > Thanks, > > Antoine > > > De : Matthew Knepley > > Envoy? : 22 octobre 2020 16:28 > ? : Antoine C?t? > > Cc : Barry Smith >; petsc-users at mcs.anl.gov > > Objet : Re: [petsc-users] Enhancing MatScale computing time > > On Thu, Oct 22, 2020 at 4:17 PM Antoine C?t? > wrote: > Hi Sir, > > MatScale in "Main Stage" is indeed called 6 times for 0% run time. In stage "Stiff_Adj" though, we get : > > MatScale 8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 50 46 0 0 0 80 98 0 0 0 482 > > MatMult is indeed expensive (23% run time) and should be improved, but MatScale in "Stiff_Adj" is still taking 50% run time > > I was a little surprised that MatScale gets only 450 MFlops. However, it looks like you are running the debugging version of PETSc. Could you configure > a version without debugging: > > $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/reconfigure-$PETSC_ARCH.py --with-debugging=0 --PETSC_ARCH=arch-master-opt > > and rerun the timings? > > Thanks, > > Matt > > Thanks, > > Antoine > De : Barry Smith > > Envoy? : 22 octobre 2020 16:09 > ? : Antoine C?t? > > Cc : petsc-users at mcs.anl.gov > > Objet : Re: [petsc-users] Enhancing MatScale computing time > > > MatMult 9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 23 48 0 0 0 61 91 0 0 0 1079 > MatScale 6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 467 > > Though the flop rate of MatScale is not so high (467) it is taking very little (0 percent of the run time while MatMult takes 23 percent of the time). > > So the main cost related to the matrices is MatMult because it has a lot of operations 9553, you might think about your algorithms you are using and if there > improvements. > > It looks like you are using some kind of multigrid and solve 6 problems with 1357 total iterations which is 200 iterations per solve. This is absolutely HUGE for multigrain, you need to tune the multigrid for you problem to bring that down to at most a couple dozen iterations per solve. > > Barry > >> On Oct 22, 2020, at 3:02 PM, Antoine C?t? > wrote: >> >> Hi, >> >> See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof. >> >> Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used. >> >> Regards, >> >> Antoine >> De : Matthew Knepley > >> Envoy? : 22 octobre 2020 15:35 >> ? : Antoine C?t? > >> Cc : petsc-users at mcs.anl.gov > >> Objet : Re: [petsc-users] Enhancing MatScale computing time >> >> On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? > wrote: >> Hi, >> >> I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) : >> >> MatZeroEntries(K) >> In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) >> MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) >> MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) >> MatDiagonalScale(K, vec1, vec1) >> MatDiagonalSet(K, vec2, ADD_VALUES) >> >> Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks : >> DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4 >> Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual >> Tried MatSetOption() >> MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency >> MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" >> MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction" >> MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly" >> According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue. >> >> I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster? >> >> Something is definitely strange. Can you please send the output of >> >> -log_view -info :mat >> >> Thanks, >> >> Matt >> >> Thank you very much! >> >> Antoine C?t? >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vecscale.patch Type: application/octet-stream Size: 577 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Fri Oct 23 02:09:47 2020 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 23 Oct 2020 09:09:47 +0200 Subject: [petsc-users] Enhancing MatScale computing time In-Reply-To: References: Message-ID: On Thu 22. Oct 2020 at 21:23, Antoine C?t? wrote: > Hi, > > I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse > matrix Mat K. The Mat is modified repeatedly by the program, using the > commands (in that order) : > > MatZeroEntries(K) > In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES) > MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY) > MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY) > MatDiagonalScale(K, vec1, vec1) > MatDiagonalSet(K, vec2, ADD_VALUES) > Why not just assemble the entire operator you seek locally in vals? You would then avoid the calls to MatDiagonalScale and MatDiagonalSet by instead calling VecGetArrayRead on vec1 and vec2 and using the local parts of these vectors you need with vals. You probably need to scatter vec1, vec2 first before VecGetArrayRead. Thanks, Dave > Computing time seems high and I would like to improve it. Running tests > with "-log_view" tells me that MatScale() is the bottle neck (50% of total > computing time) . From manual pages, I've tried a few tweaks : > > - DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees > of freedom per node, ... BAIJ can significantly enhance performance", > Chapter 14.2.4 > - Used MatMissingDiagonal() to confirm there is no missing diagonal > entries : "If the matrix Y is missing some diagonal entries this routine > can be very slow", MatDiagonalSet() manual > - Tried MatSetOption() > - MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly > efficiency > - MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly > processes have one less global reduction" > - MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, > assembly processes have one less global reduction" > - MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during > matrix assembly" > > According to "-log_view", assembly is fast (0% of total time), and the > use of a DMDA makes me believe preallocation isn't the cause of performance > issue. > > I would like to know how could I improve MatScale(). What are the best > practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? > Instead of MatDiagonalScale(), should I use another command to obtain the > same result faster? > > Thank you very much! > > Antoine C?t? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.huysegoms at fz-juelich.de Fri Oct 23 05:05:13 2020 From: m.huysegoms at fz-juelich.de (Marcel Huysegoms) Date: Fri, 23 Oct 2020 12:05:13 +0200 Subject: [petsc-users] MatOrdering for rectangular matrix In-Reply-To: <9B822030-7E72-4C6A-9669-6AA82AFB0B95@petsc.dev> References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de> <9B822030-7E72-4C6A-9669-6AA82AFB0B95@petsc.dev> Message-ID: Hi Barry, many thanks for your explanation and suggestion!! I have a much better understanding of the problem now. For some reason, I wasn't aware that permuting A by P leads to a /symmetric/ reordering of A'A. I searched for the paper by Tim Davis that describes their reordering approach ("SuiteSparseQR: multifrontal mulithreaded rank-revealing sparse QR factorization"), and as you expected, they perform the column ordering of A by using a permutation matrix P which is obtained by an ordering of A'A. However, they are using the reordered matrix AP to perform a QR decomposition, not to use it for a preconditioner as I intend to do. All in all, I will definitely try your suggested approach that SuiteSparseQR more or less also utilizes. However, I have (more or less) _one remaining question_: When calculating a column reordering matrix P based on A'A and applying this matrix to A (so having AP), then its normal equation will be P'(A'A)P as you pointed out. But P has originally been computed in a way, so that (A'A)P will be diagonally dominant, not P'(A'A)P. So won't the additional effect of P' (i.e. the row reordering) compromise the diagonal structure again? I am using the KSP in the following way: ksp = PETSc.KSP().create(PETSc.COMM_WORLD) ksp.setType("lsqr") pc = ksp.getPC() pc.setType("bjacobi") ksp.setOperators(A, A'A) ksp.solve(b, x) The paper you referenced seems very intersting to me. So I wonder, if I had a good /non-symmetric/ ordering of A'A, i.e. Q(A'A)P, and would pass this matrix to setOperators() as the second argument for the preconditioner (while using AP as first argument), what is happening internally? Does BJACOBI compute a preconditioner matrix M^(-1) for Q(A'A)P and passes this M^(-1) to LSQR for applying it to AP [yielding M^(-1)AP] before performing its iterative CG-method on this preconditioned system? In that case, could I perform the computation of M^(-1) outside of ksp.solve(), so that I could apply it myself to AP _and_ b (!!), so passing M^(-1)AP and M^(-1)b to ksp.setOperators() and ksp.solve()? Maybe my question is due to one missing piece of mathematical understanding. Does the matrix for computing the preconditioning (second argument to setOperators()) have to be exactly the normal equation (A'A) of the first argument in order to mathematically make sense? I could not find any reference why this is done/works? Thank you very much in advance for taking time for this topic! I really appreciate it. Marcel Am 22.10.20 um 16:34 schrieb Barry Smith: > ? Marcel, > > ? ?Would you like to do the following? Compute > > ? ? Q A ?P where Q is a row permutation, P a column permutation and > then apply LSQR on QAP? > > ? ? From the manual page: > > In exact arithmetic the LSQR method (with no preconditioning) is > identical to the KSPCG algorithm applied to the normal equations. > > ? ?[Q A ?P]' [Q A ?P] = P' A' A P = P'(A'A) P ?the Q drops out because > ?permutation matrices' transposes are their inverse > > ?Note that P is a small square matrix. > > ? So my conclusion is that any column permutation of A is also a > symmetric permutation of A'A so you can just try using regular > reorderings of A'A if > you want to "concentrate" the "important" parts of A'A into your > "block diagonal" preconditioner (and throw away the other parts) > > ? I don't know what it will do to the convergence. I've never had much > luck generically trying to symmetrically reorder matrices to improve > preconditioners but > for certain situation maybe it might help. For example if the matrix > is ?[0 1; 1 0] and you permute it you get the [1 0; 0 1] which looks > better. > > ? There is this https://epubs.siam.org/doi/10.1137/S1064827599361308 > but it is for non-symmetric permutations and in your case if you use a > non symmetric permeation you can no longer use LSQR. > > ? Barry > > > > >> On Oct 22, 2020, at 4:55 AM, Matthew Knepley > > wrote: >> >> On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms >> > wrote: >> >> Hi all, >> >> I'm currently implementing a Gauss-Newton approach for minimizing a >> non-linear cost function using PETSc4py. >> The (rectangular) linear systems I am trying to solve have >> dimensions of >> about (5N, N), where N is in the range of several hundred millions. >> >> Due to its size and because it's an over-determined system, I use >> LSQR >> in conjunction with a preconditioner (which operates on A^T x A, e.g. >> BJacobi). >> Depending on the ordering of the unknowns the algorithm only >> converges >> for special cases. When I use a direct LR solver (as >> preconditioner) it >> consistently converges, but consumes too much memory. I have read >> in the >> manual that the LR solver internally also applies a matrix reordering >> beforehand. >> >> My question would be: >> How can I improve the ordering of the unknowns for a rectangular >> matrix >> (in order to converge also with iterative preconditioners)? If I use >> MatGetOrdering(), it only works for square matrices. Is there a >> way to >> achieve this from within PETSc4py? >> ParMETIS seems to be a promising framework for that task. Is it >> possible >> to apply its reordering algorithm to a rectangular PETSc-matrix? >> >> I would be thankful for every bit of advice that might help. >> >> >> We do not have any rectangular reordering algorithms. I think your >> first step is to >> find something in the literature that you think will work. >> >> ? Thanks, >> >> ? ? ?Matt >> >> Best regards, >> Marcel >> >> >> ------------------------------------------------------------------------------------------------ >> ------------------------------------------------------------------------------------------------ >> Forschungszentrum Juelich GmbH >> 52425 Juelich >> Sitz der Gesellschaft: Juelich >> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke >> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), >> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt >> ------------------------------------------------------------------------------------------------ >> ------------------------------------------------------------------------------------------------ >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eugenio.Aulisa at ttu.edu Fri Oct 23 09:11:50 2020 From: Eugenio.Aulisa at ttu.edu (Aulisa, Eugenio) Date: Fri, 23 Oct 2020 14:11:50 +0000 Subject: [petsc-users] reset the sparsity pattern of a matrix without destroying and recreating it In-Reply-To: References: , Message-ID: Hi I have a time dependent problem, where at each iteration the sparsity pattern of some rows of an mpi matrix keeps changing. I have an estimate at each iteration what the maximum size of these rows should be, so I can conservatively pre-allocate the matrix memory. I then assembly using the option MatSetOption(mat, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE); It runs for few iterations but then it saturates the pre-allocated memory. What happens is that at each iteration new columns are added to the changing rows, but old entries that are now zero (and not needed anymore) are not removed, and the size of the changing rows increases till it reaches the maximum allowed value. Is there any way, when at each iteration I zero the matrix to forget the previous sparsity pattern and start from fresh, without destroying and recreating the matrix? Also, if possible, is it possible to select only the rows where this should happens? i.e. keeping the same sparsity pattern for a set of rows and forget it for the others. Thanks, Eugenio -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 23 11:51:19 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 23 Oct 2020 11:51:19 -0500 Subject: [petsc-users] reset the sparsity pattern of a matrix without destroying and recreating it In-Reply-To: References: Message-ID: <64849F0D-82BB-4D2B-B671-CB2DAA285571@petsc.dev> You should be able to call MatResetPreallocation() MatSeqAIJSetPreallocation(new data) or any other preallocations routines again and it will clean out the old material. Let us know if this does not work, Barry > On Oct 23, 2020, at 9:11 AM, Aulisa, Eugenio wrote: > > > Hi > > I have a time dependent problem, where at each iteration > the sparsity pattern of some rows of an mpi matrix keeps changing. > > I have an estimate at each iteration what the maximum size of these rows should be, > so I can conservatively pre-allocate the matrix memory. > > I then assembly using the option > MatSetOption(mat, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE); > > It runs for few iterations but then it saturates the pre-allocated memory. > > What happens is that at each iteration new columns are added to the changing rows, > but old entries that are now zero (and not needed anymore) are not removed, > and the size of the changing rows increases till it reaches the maximum allowed value. > > Is there any way, when at each iteration I zero the matrix > to forget the previous sparsity pattern and start from fresh, > without destroying and recreating the matrix? > > Also, if possible, is it possible to select only the rows where this should happens? > i.e. keeping the same sparsity pattern for a set of rows and forget it for the others. > > Thanks, > Eugenio -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 23 12:02:31 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 23 Oct 2020 12:02:31 -0500 Subject: [petsc-users] MatOrdering for rectangular matrix In-Reply-To: References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de> <9B822030-7E72-4C6A-9669-6AA82AFB0B95@petsc.dev> Message-ID: > On Oct 23, 2020, at 5:05 AM, Marcel Huysegoms wrote: > > Hi Barry, > > many thanks for your explanation and suggestion!! I have a much better understanding of the problem now. > > For some reason, I wasn't aware that permuting A by P leads to a symmetric reordering of A'A. > I searched for the paper by Tim Davis that describes their reordering approach ("SuiteSparseQR: multifrontal mulithreaded rank-revealing sparse QR factorization"), and as you expected, they perform the column ordering of A by using a permutation matrix P which is obtained by an ordering of A'A. However, they are using the reordered matrix AP to perform a QR decomposition, not to use it for a preconditioner as I intend to do. > > All in all, I will definitely try your suggested approach that SuiteSparseQR more or less also utilizes. > > However, I have (more or less) one remaining question: > > When calculating a column reordering matrix P based on A'A and applying this matrix to A (so having AP), then its normal equation will be P'(A'A)P as you pointed out. But P has originally been computed in a way, so that (A'A)P will be diagonally dominant, not P'(A'A)P. So won't the additional effect of P' (i.e. the row reordering) compromise the diagonal structure again? I don't know anything about this. My feeling was that since A'A is always symmetric one would use a symmetric reordering on it, not a one sided non-symmetric reordering. The RCM order has a reputation for bringing off diagonal arguments closer to the diagonal. Hence if you reorder with RCM and then use block Jacobi, in theory, there will be "better" blocks on the diagonal then in the original ordering. I would try that first. > > I am using the KSP in the following way: > ksp = PETSc.KSP().create(PETSc.COMM_WORLD) > ksp.setType("lsqr") > pc = ksp.getPC() > pc.setType("bjacobi") > ksp.setOperators(A, A'A) > ksp.solve(b, x) > The paper you referenced seems very intersting to me. So I wonder, if I had a good non-symmetric ordering of A'A, i.e. Q(A'A)P, and would pass this matrix to setOperators() as the second argument for the preconditioner (while using AP as first argument), what is happening internally? Does BJACOBI compute a preconditioner matrix M^(-1) for Q(A'A)P and passes this M^(-1) to LSQR for applying it to AP [yielding M^(-1)AP] before performing its iterative CG-method on this preconditioned system? In that case, could I perform the computation of M^(-1) outside of ksp.solve(), so that I could apply it myself to AP and b (!!), so passing M^(-1)AP and M^(-1)b to ksp.setOperators() and ksp.solve()? > > Maybe my question is due to one missing piece of mathematical understanding. Does the matrix for computing the preconditioning (second argument to setOperators()) have to be exactly the normal equation (A'A) of the first argument in order to mathematically make sense? I could not find any reference why this is done/works? No, you can pass any matrix you want as the "normal equation" matrix to LSQR because it only builds the preconditioner from it. The matrix-vector products that define the problem are passed as the other argument. Heuristically you want something B for A'A that is "close" to A'A in some measure. The simplest thing would be just remove some terms away from the diagonal in B. What terms to move etc is unknown to me. There are many games one can play but I don't know which ones would be good for your problem. Barry > > Thank you very much in advance for taking time for this topic! I really appreciate it. > > Marcel > > > Am 22.10.20 um 16:34 schrieb Barry Smith: >> Marcel, >> >> Would you like to do the following? Compute >> >> Q A P where Q is a row permutation, P a column permutation and then apply LSQR on QAP? >> >> >> From the manual page: >> >> In exact arithmetic the LSQR method (with no preconditioning) is identical to the KSPCG algorithm applied to the normal equations. >> >> [Q A P]' [Q A P] = P' A' A P = P'(A'A) P the Q drops out because permutation matrices' transposes are their inverse >> >> Note that P is a small square matrix. >> >> So my conclusion is that any column permutation of A is also a symmetric permutation of A'A so you can just try using regular reorderings of A'A if >> you want to "concentrate" the "important" parts of A'A into your "block diagonal" preconditioner (and throw away the other parts) >> >> I don't know what it will do to the convergence. I've never had much luck generically trying to symmetrically reorder matrices to improve preconditioners but >> for certain situation maybe it might help. For example if the matrix is [0 1; 1 0] and you permute it you get the [1 0; 0 1] which looks better. >> >> There is this https://epubs.siam.org/doi/10.1137/S1064827599361308 but it is for non-symmetric permutations and in your case if you use a non symmetric permeation you can no longer use LSQR. >> >> Barry >> >> >> >> >>> On Oct 22, 2020, at 4:55 AM, Matthew Knepley > wrote: >>> >>> On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms > wrote: >>> Hi all, >>> >>> I'm currently implementing a Gauss-Newton approach for minimizing a >>> non-linear cost function using PETSc4py. >>> The (rectangular) linear systems I am trying to solve have dimensions of >>> about (5N, N), where N is in the range of several hundred millions. >>> >>> Due to its size and because it's an over-determined system, I use LSQR >>> in conjunction with a preconditioner (which operates on A^T x A, e.g. >>> BJacobi). >>> Depending on the ordering of the unknowns the algorithm only converges >>> for special cases. When I use a direct LR solver (as preconditioner) it >>> consistently converges, but consumes too much memory. I have read in the >>> manual that the LR solver internally also applies a matrix reordering >>> beforehand. >>> >>> My question would be: >>> How can I improve the ordering of the unknowns for a rectangular matrix >>> (in order to converge also with iterative preconditioners)? If I use >>> MatGetOrdering(), it only works for square matrices. Is there a way to >>> achieve this from within PETSc4py? >>> ParMETIS seems to be a promising framework for that task. Is it possible >>> to apply its reordering algorithm to a rectangular PETSc-matrix? >>> >>> I would be thankful for every bit of advice that might help. >>> >>> We do not have any rectangular reordering algorithms. I think your first step is to >>> find something in the literature that you think will work. >>> >>> Thanks, >>> >>> Matt >>> >>> Best regards, >>> Marcel >>> >>> >>> ------------------------------------------------------------------------------------------------ >>> ------------------------------------------------------------------------------------------------ >>> Forschungszentrum Juelich GmbH >>> 52425 Juelich >>> Sitz der Gesellschaft: Juelich >>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >>> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke >>> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), >>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt >>> ------------------------------------------------------------------------------------------------ >>> ------------------------------------------------------------------------------------------------ >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bui at calcreek.com Fri Oct 23 19:21:32 2020 From: bui at calcreek.com (Thuc Bui) Date: Fri, 23 Oct 2020 17:21:32 -0700 Subject: [petsc-users] Blas undefined references when build an app linking to a shared library that is linked to a static Petsc library Message-ID: <014f01d6a99b$a0b99430$e22cbc90$@calcreek.com> Dear Petsc Users, I hope someone out there has already encountered the same linking problem and already figured this out, or has some idea how to resolve this issue. I have google searched but haven't found any solution. I successfully ported Petsc 3.13.5 both as shared and static libraries in both Windows 10 (Visual Studio 2015, no Fortran) and Ubuntu 20.4 (gcc, g++. gfortran 9.3), and successfully run "make check" in all. I am also able to build my own shared libraries linked to either shared or static Petsc library in both platforms. I was also able to link and execute my application to either of these shared libraries in Windows. Unfortunately, I could only get my app to link and execute with the shared library linked to the shared Petsc library in Ubuntu, but not with the static Petsc library. On Ubuntu, below is how I build Petsc static library, which produces three libraries: libpetsc.a, libfblas.a and libflapack.a ./configure CFLAGS="-fPIC" CXXFLAGS="-fPIC" FFLAGS="-fPIC" -with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openmp --with-debugging=0 --download-fblaslapack --with-mpi=0 --with-shared-libraries=0 Below is the make output on how I build my shared library, cPoissons.so linking to the static Petsc. Please note that I have to use --allow-multiple-definition to override the redefinition errors in lapack and blas. I also use --whole-archive to make sure the shared library has all the required information from lapack and blas. gcc -fPIC -c -o gcc/matrixUtil.o matrixUtil.c -I/home/bbwannabe/Documents/Petsc/latest/include -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include gcc -fPIC -c -o gcc/PetscSolver.o PetscSolver.c -I/home/bbwannabe/Documents/Petsc/latest/include -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include gcc -fPIC -c -o gcc/LinearSystemSolver.o LinearSystemSolver.c -I/home/bbwannabe/Documents/Petsc/latest/include -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include gcc -fPIC -c -o gcc/cPoisson.o cPoisson.c -I/home/bbwannabe/Documents/Petsc/latest/include -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include gcc -fPIC -fopenmp -shared -o gcc/cPoissons.so gcc/matrixUtil.o gcc/PetscSolver.o gcc/LinearSystemSolver.o gcc/cPoisson.o -L/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/lib -Wl,--allow-multiple-definition -Wl,--whole-archive -lpetsc -lflapack -lfblas -Wl,--no-whole-archive However, when I build my app linking to the above shared library cPoissons.so. gfortran complains about undefined references, which seem to be from blas as shown below from the output of make. Has anyone seen this kind of linking problem before? Many thanks for your help. Thuc Bui Senior R&D Engineer Calabazas Creek Research, Inc. (650) 948-5361 (Office) gfortran -fPIC -o sPoisson3D Poisson3D.f -L/home/bbwannabe/Documents/Nemesis/cPoisson/gcc -l:cPoissons.so /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv_x_' collect2: error: ld returned 1 exit status make: *** [makefile:6: sPoisson3D] Error 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 23 19:36:36 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 23 Oct 2020 19:36:36 -0500 Subject: [petsc-users] Blas undefined references when build an app linking to a shared library that is linked to a static Petsc library In-Reply-To: <014f01d6a99b$a0b99430$e22cbc90$@calcreek.com> References: <014f01d6a99b$a0b99430$e22cbc90$@calcreek.com> Message-ID: <4074A631-42A8-4E5B-9616-7DD94045308A@petsc.dev> I found http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1730 but I do not understand it. I am pretty sure PETSc is not providing in anyway even through its packages like fblaslapack blas_sgemv_x anyway I would focus on figuring out where those are coming from. Normal compilers won't need them. Do you use them? Good luck Barry > On Oct 23, 2020, at 7:21 PM, Thuc Bui wrote: > > Dear Petsc Users, > > I hope someone out there has already encountered the same linking problem and already figured this out, or has some idea how to resolve this issue. I have google searched but haven?t found any solution. > > I successfully ported Petsc 3.13.5 both as shared and static libraries in both Windows 10 (Visual Studio 2015, no Fortran) and Ubuntu 20.4 (gcc, g++. gfortran 9.3), and successfully run ?make check? in all. I am also able to build my own shared libraries linked to either shared or static Petsc library in both platforms. I was also able to link and execute my application to either of these shared libraries in Windows. Unfortunately, I could only get my app to link and execute with the shared library linked to the shared Petsc library in Ubuntu, but not with the static Petsc library. > > On Ubuntu, below is how I build Petsc static library, which produces three libraries: libpetsc.a, libfblas.a and libflapack.a > > ./configure CFLAGS="-fPIC" CXXFLAGS="-fPIC" FFLAGS="-fPIC" -with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openmp --with-debugging=0 --download-fblaslapack --with-mpi=0 --with-shared-libraries=0 > > Below is the make output on how I build my shared library, cPoissons.so linking to the static Petsc. Please note that I have to use --allow-multiple-definition to override the redefinition errors in lapack and blas. I also use --whole-archive to make sure the shared library has all the required information from lapack and blas. > > gcc -fPIC -c -o gcc/matrixUtil.o matrixUtil.c > -I/home/bbwannabe/Documents/Petsc/latest/include > -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private > -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include > gcc -fPIC -c -o gcc/PetscSolver.o PetscSolver.c > -I/home/bbwannabe/Documents/Petsc/latest/include > -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private > -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include > gcc -fPIC -c -o gcc/LinearSystemSolver.o LinearSystemSolver.c > -I/home/bbwannabe/Documents/Petsc/latest/include > -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private > -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include > gcc -fPIC -c -o gcc/cPoisson.o cPoisson.c > -I/home/bbwannabe/Documents/Petsc/latest/include > -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private > -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include > gcc -fPIC -fopenmp -shared -o gcc/cPoissons.so gcc/matrixUtil.o gcc/PetscSolver.o gcc/LinearSystemSolver.o gcc/cPoisson.o > -L/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/lib > -Wl,--allow-multiple-definition > -Wl,--whole-archive -lpetsc -lflapack -lfblas -Wl,--no-whole-archive > > However, when I build my app linking to the above shared library cPoissons.so. gfortran complains about undefined references, which seem to be from blas as shown below from the output of make. Has anyone seen this kind of linking problem before? > > Many thanks for your help. > Thuc Bui > Senior R&D Engineer > Calabazas Creek Research, Inc. > (650) 948-5361 (Office) > > gfortran -fPIC -o sPoisson3D Poisson3D.f -L/home/bbwannabe/Documents/Nemesis/cPoisson/gcc -l:cPoissons.so > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv2_x_' > /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv_x_' > collect2: error: ld returned 1 exit status > make: *** [makefile:6: sPoisson3D] Error 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bui at calcreek.com Sat Oct 24 22:50:19 2020 From: bui at calcreek.com (Thuc Bui) Date: Sat, 24 Oct 2020 20:50:19 -0700 Subject: [petsc-users] Blas undefined references when build an app linking to a shared library that is linked to a static Petsc library In-Reply-To: <4074A631-42A8-4E5B-9616-7DD94045308A@petsc.dev> References: <014f01d6a99b$a0b99430$e22cbc90$@calcreek.com> <4074A631-42A8-4E5B-9616-7DD94045308A@petsc.dev> Message-ID: <00d601d6aa81$f5def0e0$e19cd2a0$@calcreek.com> Hi Barry, Thank you very much for getting back to me, and for the link to lapack forum. I really appreciate your taking the time to look this up. I found the same site earlier and believed the undefined references were related to xblas, but didn?t see how they were able to pollute the libfblas and libflapack libraries. Anyhow, I found a work around! Since I was able to compiled, linked and executed a window app using static Petsc with f2cblas and f2clapack built by Visual Studio 2015, I did the same for Ubuntu build, configure static Petsc without gfotran and use libf2cblas and libf2clapack. All the linking issues went away! Problem is solved! Many thanks again, Thuc From: Barry Smith [mailto:bsmith at petsc.dev] Sent: Friday, October 23, 2020 5:37 PM To: Thuc Bui Cc: petsc-users Subject: Re: [petsc-users] Blas undefined references when build an app linking to a shared library that is linked to a static Petsc library I found http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2 &t=1730 but I do not understand it. I am pretty sure PETSc is not providing in anyway even through its packages like fblaslapack blas_sgemv_x anyway I would focus on figuring out where those are coming from. Normal compilers won't need them. Do you use them? Good luck Barry On Oct 23, 2020, at 7:21 PM, Thuc Bui wrote: Dear Petsc Users, I hope someone out there has already encountered the same linking problem and already figured this out, or has some idea how to resolve this issue. I have google searched but haven?t found any solution. I successfully ported Petsc 3.13.5 both as shared and static libraries in both Windows 10 (Visual Studio 2015, no Fortran) and Ubuntu 20.4 (gcc, g++. gfortran 9.3), and successfully run ?make check? in all. I am also able to build my own shared libraries linked to either shared or static Petsc library in both platforms. I was also able to link and execute my application to either of these shared libraries in Windows. Unfortunately, I could only get my app to link and execute with the shared library linked to the shared Petsc library in Ubuntu, but not with the static Petsc library. On Ubuntu, below is how I build Petsc static library, which produces three libraries: libpetsc.a, libfblas.a and libflapack.a ./configure CFLAGS="-fPIC" CXXFLAGS="-fPIC" FFLAGS="-fPIC" -with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openmp --with-debugging=0 --download-fblaslapack --with-mpi=0 --with-shared-libraries=0 Below is the make output on how I build my shared library, cPoissons.so linking to the static Petsc. Please note that I have to use --allow-multiple-definition to override the redefinition errors in lapack and blas. I also use --whole-archive to make sure the shared library has all the required information from lapack and blas. gcc -fPIC -c -o gcc/matrixUtil.o matrixUtil.c -I/home/bbwannabe/Documents/Petsc/latest/include -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include gcc -fPIC -c -o gcc/PetscSolver.o PetscSolver.c -I/home/bbwannabe/Documents/Petsc/latest/include -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include gcc -fPIC -c -o gcc/LinearSystemSolver.o LinearSystemSolver.c -I/home/bbwannabe/Documents/Petsc/latest/include -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include gcc -fPIC -c -o gcc/cPoisson.o cPoisson.c -I/home/bbwannabe/Documents/Petsc/latest/include -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include gcc -fPIC -fopenmp -shared -o gcc/cPoissons.so gcc/matrixUtil.o gcc/PetscSolver.o gcc/LinearSystemSolver.o gcc/cPoisson.o -L/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/lib -Wl,--allow-multiple-definition -Wl,--whole-archive -lpetsc -lflapack -lfblas -Wl,--no-whole-archive However, when I build my app linking to the above shared library cPoissons.so. gfortran complains about undefined references, which seem to be from blas as shown below from the output of make. Has anyone seen this kind of linking problem before? Many thanks for your help. Thuc Bui Senior R&D Engineer Calabazas Creek Research, Inc. (650) 948-5361 (Office) gfortran -fPIC -o sPoisson3D Poisson3D.f -L/home/bbwannabe/Documents/Nemesis/cPoisson/gcc -l:cPoissons.so /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv2_x_' /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv_x_' collect2: error: ld returned 1 exit status make: *** [makefile:6: sPoisson3D] Error 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Oct 26 14:12:22 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 26 Oct 2020 12:12:22 -0700 Subject: [petsc-users] change lib names Message-ID: Dear PETSc team, I like to change petsc lib name to petsc_real or petsc_complex to distinguish real vs complex version. Simply copy of libpetsc to libpetsc_real does not help. I need to update PETSc makefile to recompile but I have troubles to figure out where PETSc makefile decides the lib name. Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Oct 26 14:19:51 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 26 Oct 2020 13:19:51 -0600 Subject: [petsc-users] change lib names In-Reply-To: References: Message-ID: <875z6wpzfs.fsf@jedbrown.org> See libpetsc_shared and the following 2-3 lines in gmakefile. Sam Guo writes: > Dear PETSc team, > I like to change petsc lib name to petsc_real or petsc_complex to > distinguish real vs complex version. Simply copy of libpetsc to > libpetsc_real does not help. I need to update PETSc makefile to recompile > but I have troubles to figure out where PETSc makefile decides the lib > name. > > Thanks, > Sam From hzhang at mcs.anl.gov Mon Oct 26 14:25:14 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Mon, 26 Oct 2020 19:25:14 +0000 Subject: [petsc-users] change lib names In-Reply-To: References: Message-ID: Sam, You can build petsc with different PETSC_ARCH under same PETSC_DIR, e.g., define PETSC_ARCH = arch_real or arch_complex to build different petsc libraries. Simply switch to different PETSC_ARCH when you use them. See https://www.mcs.anl.gov/petsc/documentation/installation.html#compilers Hong ________________________________ From: petsc-users on behalf of Sam Guo Sent: Monday, October 26, 2020 2:12 PM To: PETSc Subject: [petsc-users] change lib names Dear PETSc team, I like to change petsc lib name to petsc_real or petsc_complex to distinguish real vs complex version. Simply copy of libpetsc to libpetsc_real does not help. I need to update PETSc makefile to recompile but I have troubles to figure out where PETSc makefile decides the lib name. Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Oct 26 14:28:47 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 26 Oct 2020 12:28:47 -0700 Subject: [petsc-users] change lib names In-Reply-To: References: Message-ID: Hi Zhang Hong, I know I can have different PETSC_ARCH but my application will dynamically load either real or complex version on fly and I need different lib names. Thanks, Sam On Mon, Oct 26, 2020 at 12:25 PM Zhang, Hong wrote: > Sam, > You can build petsc with different PETSC_ARCH under same PETSC_DIR, e.g., > define PETSC_ARCH = arch_real or arch_complex to build different petsc > libraries. Simply switch to different PETSC_ARCH when you use them. See > https://www.mcs.anl.gov/petsc/documentation/installation.html#compilers > > Hong > > ------------------------------ > *From:* petsc-users on behalf of Sam > Guo > *Sent:* Monday, October 26, 2020 2:12 PM > *To:* PETSc > *Subject:* [petsc-users] change lib names > > Dear PETSc team, > I like to change petsc lib name to petsc_real or petsc_complex to > distinguish real vs complex version. Simply copy of libpetsc to > libpetsc_real does not help. I need to update PETSc makefile to recompile > but I have troubles to figure out where PETSc makefile decides the lib > name. > > Thanks, > Sam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Oct 26 14:29:08 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 26 Oct 2020 12:29:08 -0700 Subject: [petsc-users] change lib names In-Reply-To: <875z6wpzfs.fsf@jedbrown.org> References: <875z6wpzfs.fsf@jedbrown.org> Message-ID: Thanks, Jed. I'll give it a try/ On Mon, Oct 26, 2020 at 12:20 PM Jed Brown wrote: > See libpetsc_shared and the following 2-3 lines in gmakefile. > > Sam Guo writes: > > > Dear PETSc team, > > I like to change petsc lib name to petsc_real or petsc_complex to > > distinguish real vs complex version. Simply copy of libpetsc to > > libpetsc_real does not help. I need to update PETSc makefile to recompile > > but I have troubles to figure out where PETSc makefile decides the lib > > name. > > > > Thanks, > > Sam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Tue Oct 27 08:46:17 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Tue, 27 Oct 2020 14:46:17 +0100 Subject: [petsc-users] superlu_dist segfault Message-ID: An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Oct 27 09:11:15 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Tue, 27 Oct 2020 14:11:15 +0000 Subject: [petsc-users] superlu_dist segfault In-Reply-To: References: Message-ID: Marius, It fails at the line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex)))) ABORT("Malloc fails for lsum[]."); We do not know what it means. You may use a debugger to check the values of the variables involved. I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone short code that reproduce the error. We can help on its investigation. Hong ________________________________ From: petsc-users on behalf of Marius Buerkle Sent: Tuesday, October 27, 2020 8:46 AM To: petsc-users at mcs.anl.gov Subject: [petsc-users] superlu_dist segfault Hi, When using MatMatSolve with superlu_dist I get a segmentation fault: Malloc fails for lsum[]. at line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c The matrix size is not particular big and I am using the petsc release branch and superlu_dist is v6.3.0 I think. Best, Marius -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Tue Oct 27 12:41:02 2020 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 27 Oct 2020 10:41:02 -0700 Subject: [petsc-users] change lib names In-Reply-To: References: <875z6wpzfs.fsf@jedbrown.org> Message-ID: Hi Jed, On windows, changing those lines allows me to link petsc with my application but failed at loading the library. I can only load the petsc lib by using libpetsc.dll and petsc.lib (instead of libpetsc_real.dll and petsc_real.lib). Thanks, Sam On Mon, Oct 26, 2020 at 12:29 PM Sam Guo wrote: > Thanks, Jed. I'll give it a try/ > > On Mon, Oct 26, 2020 at 12:20 PM Jed Brown wrote: > >> See libpetsc_shared and the following 2-3 lines in gmakefile. >> >> Sam Guo writes: >> >> > Dear PETSc team, >> > I like to change petsc lib name to petsc_real or petsc_complex to >> > distinguish real vs complex version. Simply copy of libpetsc to >> > libpetsc_real does not help. I need to update PETSc makefile to >> recompile >> > but I have troubles to figure out where PETSc makefile decides the lib >> > name. >> > >> > Thanks, >> > Sam >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Oct 27 12:53:25 2020 From: jed at jedbrown.org (Jed Brown) Date: Tue, 27 Oct 2020 11:53:25 -0600 Subject: [petsc-users] change lib names In-Reply-To: References: <875z6wpzfs.fsf@jedbrown.org> Message-ID: <87sg9zo8ru.fsf@jedbrown.org> I don't know details of Windows linking, but presume this is something like the library soname on Linux, which is set by libpetsc_soname. ``make V=1` should help with debugging -- you'll be able to see what is being passed to the linker. Sam Guo writes: > Hi Jed, > On windows, changing those lines allows me to link petsc with my > application but failed at loading the library. I can only load the petsc > lib by using libpetsc.dll and petsc.lib (instead of libpetsc_real.dll and > petsc_real.lib). > > Thanks, > Sam > > On Mon, Oct 26, 2020 at 12:29 PM Sam Guo wrote: > >> Thanks, Jed. I'll give it a try/ >> >> On Mon, Oct 26, 2020 at 12:20 PM Jed Brown wrote: >> >>> See libpetsc_shared and the following 2-3 lines in gmakefile. >>> >>> Sam Guo writes: >>> >>> > Dear PETSc team, >>> > I like to change petsc lib name to petsc_real or petsc_complex to >>> > distinguish real vs complex version. Simply copy of libpetsc to >>> > libpetsc_real does not help. I need to update PETSc makefile to >>> recompile >>> > but I have troubles to figure out where PETSc makefile decides the lib >>> > name. >>> > >>> > Thanks, >>> > Sam >>> >> From sblondel at utk.edu Tue Oct 27 14:09:23 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Tue, 27 Oct 2020 19:09:23 +0000 Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep Message-ID: Hi, I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached. I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex40.c Type: text/x-csrc Size: 12871 bytes Desc: ex40.c URL: From knepley at gmail.com Tue Oct 27 14:34:40 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 27 Oct 2020 15:34:40 -0400 Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep In-Reply-To: References: Message-ID: On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I am currently using TSSetEventHandler in my code to detect a random event > where the solution vector gets modified during the event. Ideally, after > the event happens I want the solver to use a much smaller timestep using > TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep > the solver doesn't use the set value. I managed to reproduce the behavior > by modifying ex40.c as attached. > I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong? Thanks, Matt > I think the issue is related to the fact that the fvalue is not > technically "approaching" 0 with a random event, it is more of a step > function instead. Do you have any recommendation on how to implement the > behavior I'm looking for? Let me know if I can provide additional > information. > > Best, > > Sophie > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Tue Oct 27 15:02:06 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Tue, 27 Oct 2020 20:02:06 +0000 Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep In-Reply-To: References: , Message-ID: Hi Matt, With the ex40 I attached in my previous email here is what I get printed on screen when running "./ex40 -ts_monitor -ts_event_monitor": 0 TS dt 0.1 time 0. 1 TS dt 0.5 time 0.1 2 TS dt 0.5 time 0.6 3 TS dt 0.5 time 1.1 4 TS dt 0.5 time 1.6 5 TS dt 0.5 time 2.1 6 TS dt 0.5 time 2.6 7 TS dt 0.5 time 3.1 8 TS dt 0.5 time 3.6 9 TS dt 0.5 time 4.1 10 TS dt 0.5 time 4.6 11 TS dt 0.5 time 5.1 12 TS dt 0.5 time 5.6 13 TS dt 0.5 time 6.1 14 TS dt 0.5 time 6.6 15 TS dt 0.5 time 7.1 TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations Ball hit the ground at t = 7.60 seconds 16 TS dt 0.5 time 7.6 17 TS dt 0.5 time 8.1 18 TS dt 0.5 time 8.6 19 TS dt 0.5 time 9.1 20 TS dt 0.5 time 9.6 21 TS dt 0.5 time 10.1 22 TS dt 0.5 time 10.6 23 TS dt 0.5 time 11.1 24 TS dt 0.5 time 11.6 25 TS dt 0.5 time 12.1 26 TS dt 0.5 time 12.6 27 TS dt 0.5 time 13.1 28 TS dt 0.5 time 13.6 29 TS dt 0.5 time 14.1 30 TS dt 0.5 time 14.6 31 TS dt 0.5 time 15.1 32 TS dt 0.5 time 15.6 33 TS dt 0.5 time 16.1 34 TS dt 0.5 time 16.6 35 TS dt 0.5 time 17.1 36 TS dt 0.5 time 17.6 37 TS dt 0.5 time 18.1 38 TS dt 0.5 time 18.6 39 TS dt 0.5 time 19.1 40 TS dt 0.5 time 19.6 41 TS dt 0.5 time 20.1 42 TS dt 0.5 time 20.6 43 TS dt 0.5 time 21.1 44 TS dt 0.5 time 21.6 45 TS dt 0.5 time 22.1 46 TS dt 0.5 time 22.6 47 TS dt 0.5 time 23.1 48 TS dt 0.5 time 23.6 49 TS dt 0.5 time 24.1 50 TS dt 0.5 time 24.6 51 TS dt 0.5 time 25.1 TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations Ball hit the ground at t = 25.60 seconds 52 TS dt 0.5 time 25.6 53 TS dt 0.5 time 26.1 54 TS dt 0.5 time 26.6 55 TS dt 0.5 time 27.1 56 TS dt 0.5 time 27.6 57 TS dt 0.5 time 28.1 58 TS dt 0.5 time 28.6 59 TS dt 0.5 time 29.1 60 TS dt 0.5 time 29.6 61 TS dt 0.5 time 30.1 0 TS dt 0.1 time 0. 1 TS dt 0.5 time 0.1 2 TS dt 0.5 time 0.6 3 TS dt 0.5 time 1.1 4 TS dt 0.5 time 1.6 5 TS dt 0.5 time 2.1 6 TS dt 0.5 time 2.6 7 TS dt 0.5 time 3.1 8 TS dt 0.5 time 3.6 9 TS dt 0.5 time 4.1 10 TS dt 0.5 time 4.6 11 TS dt 0.5 time 5.1 12 TS dt 0.5 time 5.6 13 TS dt 0.5 time 6.1 14 TS dt 0.5 time 6.6 15 TS dt 0.5 time 7.1 16 TS dt 0.5 time 7.6 17 TS dt 0.5 time 8.1 18 TS dt 0.5 time 8.6 19 TS dt 0.5 time 9.1 20 TS dt 0.5 time 9.6 21 TS dt 0.5 time 10.1 22 TS dt 0.5 time 10.6 23 TS dt 0.5 time 11.1 24 TS dt 0.5 time 11.6 25 TS dt 0.5 time 12.1 26 TS dt 0.5 time 12.6 TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations Ball hit the ground at t = 13.10 seconds 27 TS dt 0.5 time 13.1 28 TS dt 0.5 time 13.6 29 TS dt 0.5 time 14.1 30 TS dt 0.5 time 14.6 31 TS dt 0.5 time 15.1 32 TS dt 0.5 time 15.6 33 TS dt 0.5 time 16.1 34 TS dt 0.5 time 16.6 35 TS dt 0.5 time 17.1 36 TS dt 0.5 time 17.6 37 TS dt 0.5 time 18.1 38 TS dt 0.5 time 18.6 39 TS dt 0.5 time 19.1 40 TS dt 0.5 time 19.6 41 TS dt 0.5 time 20.1 42 TS dt 0.5 time 20.6 43 TS dt 0.5 time 21.1 44 TS dt 0.5 time 21.6 45 TS dt 0.5 time 22.1 46 TS dt 0.5 time 22.6 47 TS dt 0.5 time 23.1 TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations Ball hit the ground at t = 23.60 seconds 48 TS dt 0.5 time 23.6 49 TS dt 0.5 time 24.1 50 TS dt 0.5 time 24.6 51 TS dt 0.5 time 25.1 52 TS dt 0.5 time 25.6 53 TS dt 0.5 time 26.1 TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations Ball hit the ground at t = 26.60 seconds 54 TS dt 0.5 time 26.6 55 TS dt 0.5 time 27.1 56 TS dt 0.5 time 27.6 57 TS dt 0.5 time 28.1 58 TS dt 0.5 time 28.6 59 TS dt 0.5 time 29.1 60 TS dt 0.5 time 29.6 61 TS dt 0. time 30.1 I don't see the 0.001 timestep here, do you get a different behavior? Thank you, Sophie ________________________________ From: Matthew Knepley Sent: Tuesday, October 27, 2020 15:34 To: Blondel, Sophie Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep [External Email] On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users > wrote: Hi, I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached. I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong? Thanks, Matt I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information. Best, Sophie -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 27 15:24:36 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 27 Oct 2020 15:24:36 -0500 Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep In-Reply-To: References: Message-ID: <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev> I'm sorry the code is still fundamentally broken, I know I promised a long time ago to fix it all up but it is actually pretty hard to get right. It detects the zero by finding a small value when it should detect it by find a small region where it changes sign but surprising it is so hardwired to the size test that fixing it and testing the new code has been very difficult to me. My branch is barry/2019-08-18/fix-tsevent-posteventdt Barry > On Oct 27, 2020, at 3:02 PM, Blondel, Sophie via petsc-users wrote: > > Hi Matt, > > With the ex40 I attached in my previous email here is what I get printed on screen when running "./ex40 -ts_monitor -ts_event_monitor": > 0 TS dt 0.1 time 0. > 1 TS dt 0.5 time 0.1 > 2 TS dt 0.5 time 0.6 > 3 TS dt 0.5 time 1.1 > 4 TS dt 0.5 time 1.6 > 5 TS dt 0.5 time 2.1 > 6 TS dt 0.5 time 2.6 > 7 TS dt 0.5 time 3.1 > 8 TS dt 0.5 time 3.6 > 9 TS dt 0.5 time 4.1 > 10 TS dt 0.5 time 4.6 > 11 TS dt 0.5 time 5.1 > 12 TS dt 0.5 time 5.6 > 13 TS dt 0.5 time 6.1 > 14 TS dt 0.5 time 6.6 > 15 TS dt 0.5 time 7.1 > TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations > Ball hit the ground at t = 7.60 seconds > 16 TS dt 0.5 time 7.6 > 17 TS dt 0.5 time 8.1 > 18 TS dt 0.5 time 8.6 > 19 TS dt 0.5 time 9.1 > 20 TS dt 0.5 time 9.6 > 21 TS dt 0.5 time 10.1 > 22 TS dt 0.5 time 10.6 > 23 TS dt 0.5 time 11.1 > 24 TS dt 0.5 time 11.6 > 25 TS dt 0.5 time 12.1 > 26 TS dt 0.5 time 12.6 > 27 TS dt 0.5 time 13.1 > 28 TS dt 0.5 time 13.6 > 29 TS dt 0.5 time 14.1 > 30 TS dt 0.5 time 14.6 > 31 TS dt 0.5 time 15.1 > 32 TS dt 0.5 time 15.6 > 33 TS dt 0.5 time 16.1 > 34 TS dt 0.5 time 16.6 > 35 TS dt 0.5 time 17.1 > 36 TS dt 0.5 time 17.6 > 37 TS dt 0.5 time 18.1 > 38 TS dt 0.5 time 18.6 > 39 TS dt 0.5 time 19.1 > 40 TS dt 0.5 time 19.6 > 41 TS dt 0.5 time 20.1 > 42 TS dt 0.5 time 20.6 > 43 TS dt 0.5 time 21.1 > 44 TS dt 0.5 time 21.6 > 45 TS dt 0.5 time 22.1 > 46 TS dt 0.5 time 22.6 > 47 TS dt 0.5 time 23.1 > 48 TS dt 0.5 time 23.6 > 49 TS dt 0.5 time 24.1 > 50 TS dt 0.5 time 24.6 > 51 TS dt 0.5 time 25.1 > TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations > Ball hit the ground at t = 25.60 seconds > 52 TS dt 0.5 time 25.6 > 53 TS dt 0.5 time 26.1 > 54 TS dt 0.5 time 26.6 > 55 TS dt 0.5 time 27.1 > 56 TS dt 0.5 time 27.6 > 57 TS dt 0.5 time 28.1 > 58 TS dt 0.5 time 28.6 > 59 TS dt 0.5 time 29.1 > 60 TS dt 0.5 time 29.6 > 61 TS dt 0.5 time 30.1 > 0 TS dt 0.1 time 0. > 1 TS dt 0.5 time 0.1 > 2 TS dt 0.5 time 0.6 > 3 TS dt 0.5 time 1.1 > 4 TS dt 0.5 time 1.6 > 5 TS dt 0.5 time 2.1 > 6 TS dt 0.5 time 2.6 > 7 TS dt 0.5 time 3.1 > 8 TS dt 0.5 time 3.6 > 9 TS dt 0.5 time 4.1 > 10 TS dt 0.5 time 4.6 > 11 TS dt 0.5 time 5.1 > 12 TS dt 0.5 time 5.6 > 13 TS dt 0.5 time 6.1 > 14 TS dt 0.5 time 6.6 > 15 TS dt 0.5 time 7.1 > 16 TS dt 0.5 time 7.6 > 17 TS dt 0.5 time 8.1 > 18 TS dt 0.5 time 8.6 > 19 TS dt 0.5 time 9.1 > 20 TS dt 0.5 time 9.6 > 21 TS dt 0.5 time 10.1 > 22 TS dt 0.5 time 10.6 > 23 TS dt 0.5 time 11.1 > 24 TS dt 0.5 time 11.6 > 25 TS dt 0.5 time 12.1 > 26 TS dt 0.5 time 12.6 > TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations > Ball hit the ground at t = 13.10 seconds > 27 TS dt 0.5 time 13.1 > 28 TS dt 0.5 time 13.6 > 29 TS dt 0.5 time 14.1 > 30 TS dt 0.5 time 14.6 > 31 TS dt 0.5 time 15.1 > 32 TS dt 0.5 time 15.6 > 33 TS dt 0.5 time 16.1 > 34 TS dt 0.5 time 16.6 > 35 TS dt 0.5 time 17.1 > 36 TS dt 0.5 time 17.6 > 37 TS dt 0.5 time 18.1 > 38 TS dt 0.5 time 18.6 > 39 TS dt 0.5 time 19.1 > 40 TS dt 0.5 time 19.6 > 41 TS dt 0.5 time 20.1 > 42 TS dt 0.5 time 20.6 > 43 TS dt 0.5 time 21.1 > 44 TS dt 0.5 time 21.6 > 45 TS dt 0.5 time 22.1 > 46 TS dt 0.5 time 22.6 > 47 TS dt 0.5 time 23.1 > TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations > Ball hit the ground at t = 23.60 seconds > 48 TS dt 0.5 time 23.6 > 49 TS dt 0.5 time 24.1 > 50 TS dt 0.5 time 24.6 > 51 TS dt 0.5 time 25.1 > 52 TS dt 0.5 time 25.6 > 53 TS dt 0.5 time 26.1 > TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations > Ball hit the ground at t = 26.60 seconds > 54 TS dt 0.5 time 26.6 > 55 TS dt 0.5 time 27.1 > 56 TS dt 0.5 time 27.6 > 57 TS dt 0.5 time 28.1 > 58 TS dt 0.5 time 28.6 > 59 TS dt 0.5 time 29.1 > 60 TS dt 0.5 time 29.6 > 61 TS dt 0. time 30.1 > > I don't see the 0.001 timestep here, do you get a different behavior? > > Thank you, > > Sophie > From: Matthew Knepley > Sent: Tuesday, October 27, 2020 15:34 > To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep > > [External Email] > On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users > wrote: > Hi, > > I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached. > > I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong? > > Thanks, > > Matt > > I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information. > > Best, > > Sophie > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Tue Oct 27 15:35:11 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Tue, 27 Oct 2020 20:35:11 +0000 Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep In-Reply-To: <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev> References: , <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev> Message-ID: Hi Barry, The code had a different behavior at one point (using the initial timestep after an event) : I still use commit f0e947c45e099a328e78b13737aa9bc4c143ca79 when I really need the time step to get really small after an event. I don't know if it can help with the current code. Best, Sophie ________________________________ From: Barry Smith Sent: Tuesday, October 27, 2020 16:24 To: Blondel, Sophie Cc: Matthew Knepley ; petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep I'm sorry the code is still fundamentally broken, I know I promised a long time ago to fix it all up but it is actually pretty hard to get right. It detects the zero by finding a small value when it should detect it by find a small region where it changes sign but surprising it is so hardwired to the size test that fixing it and testing the new code has been very difficult to me. My branch is barry/2019-08-18/fix-tsevent-posteventdt Barry On Oct 27, 2020, at 3:02 PM, Blondel, Sophie via petsc-users > wrote: Hi Matt, With the ex40 I attached in my previous email here is what I get printed on screen when running "./ex40 -ts_monitor -ts_event_monitor": 0 TS dt 0.1 time 0. 1 TS dt 0.5 time 0.1 2 TS dt 0.5 time 0.6 3 TS dt 0.5 time 1.1 4 TS dt 0.5 time 1.6 5 TS dt 0.5 time 2.1 6 TS dt 0.5 time 2.6 7 TS dt 0.5 time 3.1 8 TS dt 0.5 time 3.6 9 TS dt 0.5 time 4.1 10 TS dt 0.5 time 4.6 11 TS dt 0.5 time 5.1 12 TS dt 0.5 time 5.6 13 TS dt 0.5 time 6.1 14 TS dt 0.5 time 6.6 15 TS dt 0.5 time 7.1 TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations Ball hit the ground at t = 7.60 seconds 16 TS dt 0.5 time 7.6 17 TS dt 0.5 time 8.1 18 TS dt 0.5 time 8.6 19 TS dt 0.5 time 9.1 20 TS dt 0.5 time 9.6 21 TS dt 0.5 time 10.1 22 TS dt 0.5 time 10.6 23 TS dt 0.5 time 11.1 24 TS dt 0.5 time 11.6 25 TS dt 0.5 time 12.1 26 TS dt 0.5 time 12.6 27 TS dt 0.5 time 13.1 28 TS dt 0.5 time 13.6 29 TS dt 0.5 time 14.1 30 TS dt 0.5 time 14.6 31 TS dt 0.5 time 15.1 32 TS dt 0.5 time 15.6 33 TS dt 0.5 time 16.1 34 TS dt 0.5 time 16.6 35 TS dt 0.5 time 17.1 36 TS dt 0.5 time 17.6 37 TS dt 0.5 time 18.1 38 TS dt 0.5 time 18.6 39 TS dt 0.5 time 19.1 40 TS dt 0.5 time 19.6 41 TS dt 0.5 time 20.1 42 TS dt 0.5 time 20.6 43 TS dt 0.5 time 21.1 44 TS dt 0.5 time 21.6 45 TS dt 0.5 time 22.1 46 TS dt 0.5 time 22.6 47 TS dt 0.5 time 23.1 48 TS dt 0.5 time 23.6 49 TS dt 0.5 time 24.1 50 TS dt 0.5 time 24.6 51 TS dt 0.5 time 25.1 TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations Ball hit the ground at t = 25.60 seconds 52 TS dt 0.5 time 25.6 53 TS dt 0.5 time 26.1 54 TS dt 0.5 time 26.6 55 TS dt 0.5 time 27.1 56 TS dt 0.5 time 27.6 57 TS dt 0.5 time 28.1 58 TS dt 0.5 time 28.6 59 TS dt 0.5 time 29.1 60 TS dt 0.5 time 29.6 61 TS dt 0.5 time 30.1 0 TS dt 0.1 time 0. 1 TS dt 0.5 time 0.1 2 TS dt 0.5 time 0.6 3 TS dt 0.5 time 1.1 4 TS dt 0.5 time 1.6 5 TS dt 0.5 time 2.1 6 TS dt 0.5 time 2.6 7 TS dt 0.5 time 3.1 8 TS dt 0.5 time 3.6 9 TS dt 0.5 time 4.1 10 TS dt 0.5 time 4.6 11 TS dt 0.5 time 5.1 12 TS dt 0.5 time 5.6 13 TS dt 0.5 time 6.1 14 TS dt 0.5 time 6.6 15 TS dt 0.5 time 7.1 16 TS dt 0.5 time 7.6 17 TS dt 0.5 time 8.1 18 TS dt 0.5 time 8.6 19 TS dt 0.5 time 9.1 20 TS dt 0.5 time 9.6 21 TS dt 0.5 time 10.1 22 TS dt 0.5 time 10.6 23 TS dt 0.5 time 11.1 24 TS dt 0.5 time 11.6 25 TS dt 0.5 time 12.1 26 TS dt 0.5 time 12.6 TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations Ball hit the ground at t = 13.10 seconds 27 TS dt 0.5 time 13.1 28 TS dt 0.5 time 13.6 29 TS dt 0.5 time 14.1 30 TS dt 0.5 time 14.6 31 TS dt 0.5 time 15.1 32 TS dt 0.5 time 15.6 33 TS dt 0.5 time 16.1 34 TS dt 0.5 time 16.6 35 TS dt 0.5 time 17.1 36 TS dt 0.5 time 17.6 37 TS dt 0.5 time 18.1 38 TS dt 0.5 time 18.6 39 TS dt 0.5 time 19.1 40 TS dt 0.5 time 19.6 41 TS dt 0.5 time 20.1 42 TS dt 0.5 time 20.6 43 TS dt 0.5 time 21.1 44 TS dt 0.5 time 21.6 45 TS dt 0.5 time 22.1 46 TS dt 0.5 time 22.6 47 TS dt 0.5 time 23.1 TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations Ball hit the ground at t = 23.60 seconds 48 TS dt 0.5 time 23.6 49 TS dt 0.5 time 24.1 50 TS dt 0.5 time 24.6 51 TS dt 0.5 time 25.1 52 TS dt 0.5 time 25.6 53 TS dt 0.5 time 26.1 TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations Ball hit the ground at t = 26.60 seconds 54 TS dt 0.5 time 26.6 55 TS dt 0.5 time 27.1 56 TS dt 0.5 time 27.6 57 TS dt 0.5 time 28.1 58 TS dt 0.5 time 28.6 59 TS dt 0.5 time 29.1 60 TS dt 0.5 time 29.6 61 TS dt 0. time 30.1 I don't see the 0.001 timestep here, do you get a different behavior? Thank you, Sophie ________________________________ From: Matthew Knepley > Sent: Tuesday, October 27, 2020 15:34 To: Blondel, Sophie > Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep [External Email] On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users > wrote: Hi, I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached. I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong? Thanks, Matt I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information. Best, Sophie -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Oct 27 15:41:15 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 27 Oct 2020 16:41:15 -0400 Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep In-Reply-To: <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev> References: <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev> Message-ID: On Tue, Oct 27, 2020 at 4:24 PM Barry Smith wrote: > > I'm sorry the code is still fundamentally broken, I know I promised a > long time ago to fix it all up but it is actually pretty hard to get right. > > It detects the zero by finding a small value when it should detect it by > find a small region where it changes sign but surprising it is so hardwired > to the size test that fixing it and testing the new code has been very > difficult to me. My branch is barry/2019-08-18/fix-tsevent-posteventdt > Barry, I do not see this branch on gitlab. Can you give a URL? Thanks, Matt > Barry > > > > On Oct 27, 2020, at 3:02 PM, Blondel, Sophie via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi Matt, > > With the ex40 I attached in my previous email here is what I get printed > on screen when running "./ex40 -ts_monitor -ts_event_monitor": > 0 TS dt 0.1 time 0. > 1 TS dt 0.5 time 0.1 > 2 TS dt 0.5 time 0.6 > 3 TS dt 0.5 time 1.1 > 4 TS dt 0.5 time 1.6 > 5 TS dt 0.5 time 2.1 > 6 TS dt 0.5 time 2.6 > 7 TS dt 0.5 time 3.1 > 8 TS dt 0.5 time 3.6 > 9 TS dt 0.5 time 4.1 > 10 TS dt 0.5 time 4.6 > 11 TS dt 0.5 time 5.1 > 12 TS dt 0.5 time 5.6 > 13 TS dt 0.5 time 6.1 > 14 TS dt 0.5 time 6.6 > 15 TS dt 0.5 time 7.1 > TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations > Ball hit the ground at t = 7.60 seconds > 16 TS dt 0.5 time 7.6 > 17 TS dt 0.5 time 8.1 > 18 TS dt 0.5 time 8.6 > 19 TS dt 0.5 time 9.1 > 20 TS dt 0.5 time 9.6 > 21 TS dt 0.5 time 10.1 > 22 TS dt 0.5 time 10.6 > 23 TS dt 0.5 time 11.1 > 24 TS dt 0.5 time 11.6 > 25 TS dt 0.5 time 12.1 > 26 TS dt 0.5 time 12.6 > 27 TS dt 0.5 time 13.1 > 28 TS dt 0.5 time 13.6 > 29 TS dt 0.5 time 14.1 > 30 TS dt 0.5 time 14.6 > 31 TS dt 0.5 time 15.1 > 32 TS dt 0.5 time 15.6 > 33 TS dt 0.5 time 16.1 > 34 TS dt 0.5 time 16.6 > 35 TS dt 0.5 time 17.1 > 36 TS dt 0.5 time 17.6 > 37 TS dt 0.5 time 18.1 > 38 TS dt 0.5 time 18.6 > 39 TS dt 0.5 time 19.1 > 40 TS dt 0.5 time 19.6 > 41 TS dt 0.5 time 20.1 > 42 TS dt 0.5 time 20.6 > 43 TS dt 0.5 time 21.1 > 44 TS dt 0.5 time 21.6 > 45 TS dt 0.5 time 22.1 > 46 TS dt 0.5 time 22.6 > 47 TS dt 0.5 time 23.1 > 48 TS dt 0.5 time 23.6 > 49 TS dt 0.5 time 24.1 > 50 TS dt 0.5 time 24.6 > 51 TS dt 0.5 time 25.1 > TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations > Ball hit the ground at t = 25.60 seconds > 52 TS dt 0.5 time 25.6 > 53 TS dt 0.5 time 26.1 > 54 TS dt 0.5 time 26.6 > 55 TS dt 0.5 time 27.1 > 56 TS dt 0.5 time 27.6 > 57 TS dt 0.5 time 28.1 > 58 TS dt 0.5 time 28.6 > 59 TS dt 0.5 time 29.1 > 60 TS dt 0.5 time 29.6 > 61 TS dt 0.5 time 30.1 > 0 TS dt 0.1 time 0. > 1 TS dt 0.5 time 0.1 > 2 TS dt 0.5 time 0.6 > 3 TS dt 0.5 time 1.1 > 4 TS dt 0.5 time 1.6 > 5 TS dt 0.5 time 2.1 > 6 TS dt 0.5 time 2.6 > 7 TS dt 0.5 time 3.1 > 8 TS dt 0.5 time 3.6 > 9 TS dt 0.5 time 4.1 > 10 TS dt 0.5 time 4.6 > 11 TS dt 0.5 time 5.1 > 12 TS dt 0.5 time 5.6 > 13 TS dt 0.5 time 6.1 > 14 TS dt 0.5 time 6.6 > 15 TS dt 0.5 time 7.1 > 16 TS dt 0.5 time 7.6 > 17 TS dt 0.5 time 8.1 > 18 TS dt 0.5 time 8.6 > 19 TS dt 0.5 time 9.1 > 20 TS dt 0.5 time 9.6 > 21 TS dt 0.5 time 10.1 > 22 TS dt 0.5 time 10.6 > 23 TS dt 0.5 time 11.1 > 24 TS dt 0.5 time 11.6 > 25 TS dt 0.5 time 12.1 > 26 TS dt 0.5 time 12.6 > TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations > Ball hit the ground at t = 13.10 seconds > 27 TS dt 0.5 time 13.1 > 28 TS dt 0.5 time 13.6 > 29 TS dt 0.5 time 14.1 > 30 TS dt 0.5 time 14.6 > 31 TS dt 0.5 time 15.1 > 32 TS dt 0.5 time 15.6 > 33 TS dt 0.5 time 16.1 > 34 TS dt 0.5 time 16.6 > 35 TS dt 0.5 time 17.1 > 36 TS dt 0.5 time 17.6 > 37 TS dt 0.5 time 18.1 > 38 TS dt 0.5 time 18.6 > 39 TS dt 0.5 time 19.1 > 40 TS dt 0.5 time 19.6 > 41 TS dt 0.5 time 20.1 > 42 TS dt 0.5 time 20.6 > 43 TS dt 0.5 time 21.1 > 44 TS dt 0.5 time 21.6 > 45 TS dt 0.5 time 22.1 > 46 TS dt 0.5 time 22.6 > 47 TS dt 0.5 time 23.1 > TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations > Ball hit the ground at t = 23.60 seconds > 48 TS dt 0.5 time 23.6 > 49 TS dt 0.5 time 24.1 > 50 TS dt 0.5 time 24.6 > 51 TS dt 0.5 time 25.1 > 52 TS dt 0.5 time 25.6 > 53 TS dt 0.5 time 26.1 > TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations > Ball hit the ground at t = 26.60 seconds > 54 TS dt 0.5 time 26.6 > 55 TS dt 0.5 time 27.1 > 56 TS dt 0.5 time 27.6 > 57 TS dt 0.5 time 28.1 > 58 TS dt 0.5 time 28.6 > 59 TS dt 0.5 time 29.1 > 60 TS dt 0.5 time 29.6 > 61 TS dt 0. time 30.1 > > I don't see the 0.001 timestep here, do you get a different behavior? > > Thank you, > > Sophie > ------------------------------ > *From:* Matthew Knepley > *Sent:* Tuesday, October 27, 2020 15:34 > *To:* Blondel, Sophie > *Cc:* petsc-users at mcs.anl.gov ; > xolotl-psi-development at lists.sourceforge.net < > xolotl-psi-development at lists.sourceforge.net> > *Subject:* Re: [petsc-users] TSSetEventHandler and > TSSetPostEventIntervalStep > > > *[External Email]* > On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, > > I am currently using TSSetEventHandler in my code to detect a random event > where the solution vector gets modified during the event. Ideally, after > the event happens I want the solver to use a much smaller timestep using > TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep > the solver doesn't use the set value. I managed to reproduce the behavior > by modifying ex40.c as attached. > > > I stepped through ex40, and it does indeed change the timestep to 0.001. > Can you be more specific, perhaps with monitors, about what you think is > wrong? > > Thanks, > > Matt > > > I think the issue is related to the fact that the fvalue is not > technically "approaching" 0 with a random event, it is more of a step > function instead. Do you have any recommendation on how to implement the > behavior I'm looking for? Let me know if I can provide additional > information. > > Best, > > Sophie > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Oct 27 16:01:29 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 27 Oct 2020 16:01:29 -0500 Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep In-Reply-To: References: <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev> Message-ID: Pushed > On Oct 27, 2020, at 3:41 PM, Matthew Knepley wrote: > > On Tue, Oct 27, 2020 at 4:24 PM Barry Smith > wrote: > > I'm sorry the code is still fundamentally broken, I know I promised a long time ago to fix it all up but it is actually pretty hard to get right. > > It detects the zero by finding a small value when it should detect it by find a small region where it changes sign but surprising it is so hardwired > to the size test that fixing it and testing the new code has been very difficult to me. My branch is barry/2019-08-18/fix-tsevent-posteventdt > > Barry, I do not see this branch on gitlab. Can you give a URL? > > Thanks, > > Matt > > Barry > > > >> On Oct 27, 2020, at 3:02 PM, Blondel, Sophie via petsc-users > wrote: >> >> Hi Matt, >> >> With the ex40 I attached in my previous email here is what I get printed on screen when running "./ex40 -ts_monitor -ts_event_monitor": >> 0 TS dt 0.1 time 0. >> 1 TS dt 0.5 time 0.1 >> 2 TS dt 0.5 time 0.6 >> 3 TS dt 0.5 time 1.1 >> 4 TS dt 0.5 time 1.6 >> 5 TS dt 0.5 time 2.1 >> 6 TS dt 0.5 time 2.6 >> 7 TS dt 0.5 time 3.1 >> 8 TS dt 0.5 time 3.6 >> 9 TS dt 0.5 time 4.1 >> 10 TS dt 0.5 time 4.6 >> 11 TS dt 0.5 time 5.1 >> 12 TS dt 0.5 time 5.6 >> 13 TS dt 0.5 time 6.1 >> 14 TS dt 0.5 time 6.6 >> 15 TS dt 0.5 time 7.1 >> TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations >> Ball hit the ground at t = 7.60 seconds >> 16 TS dt 0.5 time 7.6 >> 17 TS dt 0.5 time 8.1 >> 18 TS dt 0.5 time 8.6 >> 19 TS dt 0.5 time 9.1 >> 20 TS dt 0.5 time 9.6 >> 21 TS dt 0.5 time 10.1 >> 22 TS dt 0.5 time 10.6 >> 23 TS dt 0.5 time 11.1 >> 24 TS dt 0.5 time 11.6 >> 25 TS dt 0.5 time 12.1 >> 26 TS dt 0.5 time 12.6 >> 27 TS dt 0.5 time 13.1 >> 28 TS dt 0.5 time 13.6 >> 29 TS dt 0.5 time 14.1 >> 30 TS dt 0.5 time 14.6 >> 31 TS dt 0.5 time 15.1 >> 32 TS dt 0.5 time 15.6 >> 33 TS dt 0.5 time 16.1 >> 34 TS dt 0.5 time 16.6 >> 35 TS dt 0.5 time 17.1 >> 36 TS dt 0.5 time 17.6 >> 37 TS dt 0.5 time 18.1 >> 38 TS dt 0.5 time 18.6 >> 39 TS dt 0.5 time 19.1 >> 40 TS dt 0.5 time 19.6 >> 41 TS dt 0.5 time 20.1 >> 42 TS dt 0.5 time 20.6 >> 43 TS dt 0.5 time 21.1 >> 44 TS dt 0.5 time 21.6 >> 45 TS dt 0.5 time 22.1 >> 46 TS dt 0.5 time 22.6 >> 47 TS dt 0.5 time 23.1 >> 48 TS dt 0.5 time 23.6 >> 49 TS dt 0.5 time 24.1 >> 50 TS dt 0.5 time 24.6 >> 51 TS dt 0.5 time 25.1 >> TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations >> Ball hit the ground at t = 25.60 seconds >> 52 TS dt 0.5 time 25.6 >> 53 TS dt 0.5 time 26.1 >> 54 TS dt 0.5 time 26.6 >> 55 TS dt 0.5 time 27.1 >> 56 TS dt 0.5 time 27.6 >> 57 TS dt 0.5 time 28.1 >> 58 TS dt 0.5 time 28.6 >> 59 TS dt 0.5 time 29.1 >> 60 TS dt 0.5 time 29.6 >> 61 TS dt 0.5 time 30.1 >> 0 TS dt 0.1 time 0. >> 1 TS dt 0.5 time 0.1 >> 2 TS dt 0.5 time 0.6 >> 3 TS dt 0.5 time 1.1 >> 4 TS dt 0.5 time 1.6 >> 5 TS dt 0.5 time 2.1 >> 6 TS dt 0.5 time 2.6 >> 7 TS dt 0.5 time 3.1 >> 8 TS dt 0.5 time 3.6 >> 9 TS dt 0.5 time 4.1 >> 10 TS dt 0.5 time 4.6 >> 11 TS dt 0.5 time 5.1 >> 12 TS dt 0.5 time 5.6 >> 13 TS dt 0.5 time 6.1 >> 14 TS dt 0.5 time 6.6 >> 15 TS dt 0.5 time 7.1 >> 16 TS dt 0.5 time 7.6 >> 17 TS dt 0.5 time 8.1 >> 18 TS dt 0.5 time 8.6 >> 19 TS dt 0.5 time 9.1 >> 20 TS dt 0.5 time 9.6 >> 21 TS dt 0.5 time 10.1 >> 22 TS dt 0.5 time 10.6 >> 23 TS dt 0.5 time 11.1 >> 24 TS dt 0.5 time 11.6 >> 25 TS dt 0.5 time 12.1 >> 26 TS dt 0.5 time 12.6 >> TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations >> Ball hit the ground at t = 13.10 seconds >> 27 TS dt 0.5 time 13.1 >> 28 TS dt 0.5 time 13.6 >> 29 TS dt 0.5 time 14.1 >> 30 TS dt 0.5 time 14.6 >> 31 TS dt 0.5 time 15.1 >> 32 TS dt 0.5 time 15.6 >> 33 TS dt 0.5 time 16.1 >> 34 TS dt 0.5 time 16.6 >> 35 TS dt 0.5 time 17.1 >> 36 TS dt 0.5 time 17.6 >> 37 TS dt 0.5 time 18.1 >> 38 TS dt 0.5 time 18.6 >> 39 TS dt 0.5 time 19.1 >> 40 TS dt 0.5 time 19.6 >> 41 TS dt 0.5 time 20.1 >> 42 TS dt 0.5 time 20.6 >> 43 TS dt 0.5 time 21.1 >> 44 TS dt 0.5 time 21.6 >> 45 TS dt 0.5 time 22.1 >> 46 TS dt 0.5 time 22.6 >> 47 TS dt 0.5 time 23.1 >> TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations >> Ball hit the ground at t = 23.60 seconds >> 48 TS dt 0.5 time 23.6 >> 49 TS dt 0.5 time 24.1 >> 50 TS dt 0.5 time 24.6 >> 51 TS dt 0.5 time 25.1 >> 52 TS dt 0.5 time 25.6 >> 53 TS dt 0.5 time 26.1 >> TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations >> Ball hit the ground at t = 26.60 seconds >> 54 TS dt 0.5 time 26.6 >> 55 TS dt 0.5 time 27.1 >> 56 TS dt 0.5 time 27.6 >> 57 TS dt 0.5 time 28.1 >> 58 TS dt 0.5 time 28.6 >> 59 TS dt 0.5 time 29.1 >> 60 TS dt 0.5 time 29.6 >> 61 TS dt 0. time 30.1 >> >> I don't see the 0.001 timestep here, do you get a different behavior? >> >> Thank you, >> >> Sophie >> From: Matthew Knepley > >> Sent: Tuesday, October 27, 2020 15:34 >> To: Blondel, Sophie > >> Cc: petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep >> >> [External Email] >> >> On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users > wrote: >> Hi, >> >> I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached. >> >> I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong? >> >> Thanks, >> >> Matt >> >> I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information. >> >> Best, >> >> Sophie >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Tue Oct 27 22:01:40 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Wed, 28 Oct 2020 04:01:40 +0100 Subject: [petsc-users] superlu_dist segfault In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jonathan.guyer at nist.gov Wed Oct 28 11:35:19 2020 From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed)) Date: Wed, 28 Oct 2020 16:35:19 +0000 Subject: [petsc-users] Vexing deadlock situation with petsc4py Message-ID: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov> We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized. I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761 I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions: On processor 0 in matrix.setUp() specifically in [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c and on other processors a few lines earlier in matrix.create(comm) specifically in [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like >>> from scipy.stats.mstats import argstoarray Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger. Running with `-malloc_debug true` doesn?t illuminate anything. I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output. I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.guyer at nist.gov Wed Oct 28 12:08:21 2020 From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed)) Date: Wed, 28 Oct 2020 17:08:21 +0000 Subject: [petsc-users] Vexing deadlock situation with petsc4py In-Reply-To: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov> References: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov> Message-ID: <3E1C3DF7-1184-4E33-8701-941963737B05@nist.gov> I should note that I?m running with a --with-debugging build that I?ve [forked from conda-forge/petsc-feedstock](https://github.com/guyer/petsc-feedstock/), but it doesn?t highlight any problems. When I -start_in_debugger, I drop into lldb[*], but there are no symbols. The last assembler I knew was for the 6502 and I haven?t known that for a looooong time. How can I get symbols included in my build? If I drop into the [i]pdb Python debugger, the problem goes away. [*] I?m running on a Mac, but the same deadlock happens on our linux builds On Oct 28, 2020, at 12:35 PM, Guyer, Jonathan E. Dr. (Fed) via petsc-users > wrote: We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized. I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761 I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions: On processor 0 in matrix.setUp() specifically in [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c and on other processors a few lines earlier in matrix.create(comm) specifically in [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like >>> from scipy.stats.mstats import argstoarray Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger. Running with `-malloc_debug true` doesn?t illuminate anything. I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output. I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Wed Oct 28 12:21:02 2020 From: wence at gmx.li (Lawrence Mitchell) Date: Wed, 28 Oct 2020 17:21:02 +0000 Subject: [petsc-users] Vexing deadlock situation with petsc4py In-Reply-To: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov> References: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov> Message-ID: <6D258122-9563-4EAD-9ED4-6EA3A3177481@gmx.li> > On 28 Oct 2020, at 16:35, Guyer, Jonathan E. Dr. (Fed) via petsc-users wrote: > > We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized. > > I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761 > > I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions: > > On processor 0 in > > matrix.setUp() > > specifically in > > [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c > > and on other processors a few lines earlier in > > matrix.create(comm) > > specifically in > > [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c > > > The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like > > >>> from scipy.stats.mstats import argstoarray > > Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger. > > Running with `-malloc_debug true` doesn?t illuminate anything. > > I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output. > > > > I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering. My crystal ball guess is the following: PETSc objects have collective destroy semantics. When using petsc4py, XXX.destroy() is called on an object when its Python refcount drops to zero, or when it is collected by the generational garbage collector. In the absence of reference-cycles, all allocated objects will be collected by the refcounting part of the collector. This is (unless you do something funky like hold more references on one process than another) deterministic, and if you do normal SPMD programming, you'll call XXX.destroy() in the same order on the same objects on all processes. If you have reference cycles, then the refcounting part of the collector will not collect these objects. Now you are at the mercy of the generational collector. This is definitely not deterministic. If different Python processes do different things (for example, rank 0 might open files) then when the generational collector runs is no longer in sync across processes. A consequence is that you now might have rank 0 collect XXX then YYY, whereas rank 1 might collect YYY then XXX => deadlock. You can test this hypothesis by turning off the garbage collector in your test that provokes the failure: import gc gc.disable() ... If this turns out to be the case, I don't think there's a good solution here. You can audit your code base and ensure that objects that hold PETSc objects never participate in reference cycles. This is fragile. Another option, is to explicitly require that the user of the API call XXX.destroy() on all your objects (and then PETSc objects). This is the decision taken for mpi4py: you are responsible for freeing any objects that you create. That is, your API becomes more like the C API with x = Foo(...) # holds some petsc object XX ... # use x x.destroy() # calls XX.destroy() you could make this more pythonic by wrapping this pattern in contextmanagers: with Foo(...) as x: ... Thanks, Lawrence From jonathan.guyer at nist.gov Wed Oct 28 12:32:33 2020 From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed)) Date: Wed, 28 Oct 2020 17:32:33 +0000 Subject: [petsc-users] Vexing deadlock situation with petsc4py In-Reply-To: <6D258122-9563-4EAD-9ED4-6EA3A3177481@gmx.li> References: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov> <6D258122-9563-4EAD-9ED4-6EA3A3177481@gmx.li> Message-ID: <6DE14276-6194-456A-B558-496DE4C67E27@nist.gov> That?s very helpful, thanks! Adding `gc.collect()` to the beginning of the offending test does indeed resolve that particular problem. I?ve not been systematic about calling XXX.destroy(), thinking garbage collection was sufficient, so I need to get to work on that. > On Oct 28, 2020, at 1:21 PM, Lawrence Mitchell wrote: > > >> On 28 Oct 2020, at 16:35, Guyer, Jonathan E. Dr. (Fed) via petsc-users wrote: >> >> We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized. >> >> I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761 >> >> I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions: >> >> On processor 0 in >> >> matrix.setUp() >> >> specifically in >> >> [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c >> >> and on other processors a few lines earlier in >> >> matrix.create(comm) >> >> specifically in >> >> [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c >> >> >> The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like >> >>>>> from scipy.stats.mstats import argstoarray >> >> Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger. >> >> Running with `-malloc_debug true` doesn?t illuminate anything. >> >> I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output. >> >> >> >> I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering. > > My crystal ball guess is the following: > > PETSc objects have collective destroy semantics. > > When using petsc4py, XXX.destroy() is called on an object when its Python refcount drops to zero, or when it is collected by the generational garbage collector. > > In the absence of reference-cycles, all allocated objects will be collected by the refcounting part of the collector. This is (unless you do something funky like hold more references on one process than another) deterministic, and if you do normal SPMD programming, you'll call XXX.destroy() in the same order on the same objects on all processes. > > If you have reference cycles, then the refcounting part of the collector will not collect these objects. Now you are at the mercy of the generational collector. This is definitely not deterministic. If different Python processes do different things (for example, rank 0 might open files) then when the generational collector runs is no longer in sync across processes. > > A consequence is that you now might have rank 0 collect XXX then YYY, whereas rank 1 might collect YYY then XXX => deadlock. > > You can test this hypothesis by turning off the garbage collector in your test that provokes the failure: > > import gc > gc.disable() > ... > > If this turns out to be the case, I don't think there's a good solution here. You can audit your code base and ensure that objects that hold PETSc objects never participate in reference cycles. This is fragile. > > Another option, is to explicitly require that the user of the API call XXX.destroy() on all your objects (and then PETSc objects). This is the decision taken for mpi4py: you are responsible for freeing any objects that you create. > > That is, your API becomes more like the C API with > > x = Foo(...) # holds some petsc object XX > ... # use x > x.destroy() # calls XX.destroy() > > you could make this more pythonic by wrapping this pattern in contextmanagers: > > with Foo(...) as x: > ... > > > Thanks, > > Lawrence From jonathan.guyer at nist.gov Wed Oct 28 12:33:13 2020 From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed)) Date: Wed, 28 Oct 2020 17:33:13 +0000 Subject: [petsc-users] Vexing deadlock situation with petsc4py In-Reply-To: <6DE14276-6194-456A-B558-496DE4C67E27@nist.gov> References: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov> <6D258122-9563-4EAD-9ED4-6EA3A3177481@gmx.li> <6DE14276-6194-456A-B558-496DE4C67E27@nist.gov> Message-ID: <144D75DD-1BD8-4C97-8399-C528CD6FE400@nist.gov> *gc.disable() > On Oct 28, 2020, at 1:32 PM, Jonathan Guyer wrote: > > That?s very helpful, thanks! > > Adding `gc.collect()` to the beginning of the offending test does indeed resolve that particular problem. > > I?ve not been systematic about calling XXX.destroy(), thinking garbage collection was sufficient, so I need to get to work on that. > >> On Oct 28, 2020, at 1:21 PM, Lawrence Mitchell wrote: >> >> >>> On 28 Oct 2020, at 16:35, Guyer, Jonathan E. Dr. (Fed) via petsc-users wrote: >>> >>> We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized. >>> >>> I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761 >>> >>> I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions: >>> >>> On processor 0 in >>> >>> matrix.setUp() >>> >>> specifically in >>> >>> [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c >>> >>> and on other processors a few lines earlier in >>> >>> matrix.create(comm) >>> >>> specifically in >>> >>> [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c >>> >>> >>> The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like >>> >>>>>> from scipy.stats.mstats import argstoarray >>> >>> Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger. >>> >>> Running with `-malloc_debug true` doesn?t illuminate anything. >>> >>> I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output. >>> >>> >>> >>> I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering. >> >> My crystal ball guess is the following: >> >> PETSc objects have collective destroy semantics. >> >> When using petsc4py, XXX.destroy() is called on an object when its Python refcount drops to zero, or when it is collected by the generational garbage collector. >> >> In the absence of reference-cycles, all allocated objects will be collected by the refcounting part of the collector. This is (unless you do something funky like hold more references on one process than another) deterministic, and if you do normal SPMD programming, you'll call XXX.destroy() in the same order on the same objects on all processes. >> >> If you have reference cycles, then the refcounting part of the collector will not collect these objects. Now you are at the mercy of the generational collector. This is definitely not deterministic. If different Python processes do different things (for example, rank 0 might open files) then when the generational collector runs is no longer in sync across processes. >> >> A consequence is that you now might have rank 0 collect XXX then YYY, whereas rank 1 might collect YYY then XXX => deadlock. >> >> You can test this hypothesis by turning off the garbage collector in your test that provokes the failure: >> >> import gc >> gc.disable() >> ... >> >> If this turns out to be the case, I don't think there's a good solution here. You can audit your code base and ensure that objects that hold PETSc objects never participate in reference cycles. This is fragile. >> >> Another option, is to explicitly require that the user of the API call XXX.destroy() on all your objects (and then PETSc objects). This is the decision taken for mpi4py: you are responsible for freeing any objects that you create. >> >> That is, your API becomes more like the C API with >> >> x = Foo(...) # holds some petsc object XX >> ... # use x >> x.destroy() # calls XX.destroy() >> >> you could make this more pythonic by wrapping this pattern in contextmanagers: >> >> with Foo(...) as x: >> ... >> >> >> Thanks, >> >> Lawrence > From sajidsyed2021 at u.northwestern.edu Wed Oct 28 14:12:47 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Wed, 28 Oct 2020 14:12:47 -0500 Subject: [petsc-users] Regarding changes in the 3.14 release Message-ID: Hi PETSc-developers, I have a few questions regarding changes to PETSc between version 3.13.5 and current master. I?m trying to run an application that worked with no issues with version 3.13.5 but isn?t working with the current master. [1] To assemble a matrix in this application I loop over all rows and have multiple calls to MatSetValuesStencil with INSERT_VALUES as the addv argument for all except one call which has ADD_VALUES. Final assembly is called after this loop. With PETSc-3.13.5 this ran with no errors but with PETSc-master I get : Object is in wrong state [0]PETSC ERROR: Cannot mix add values and insert values This is fixed by having a flush assembly in between two stages where the first stage has two loops with INSERT_VALUES and the second stage has a loop with ADD_VALUES. Did this change result from a bugfix or are users now expected to no longer mix add and insert values within the same loop ? [2] To prevent re-building the preconditioner at all TSSteps, I had the command line argument -snes_lag_preconditioner -1. This did the job in 3.13.5 but with the current master I get the following error : Cannot set the lag to -1 from the command line since the preconditioner must be built as least once, perhaps you mean -2 I can however run the application without this option. If this is a breaking change, what is the new option to prevent re-building the preconditioner ? [3] Finally, I?m used the latest development version of MPICH for building both 3.13.5 and petsc-master and I get these warnings at exit : [WARNING] yaksa: 2 leaked handles .... (repeated N number of times where N is number of mpi ranks) Can this be safely neglected ? Let me know if sharing either the application code and/or logs would be helpful and I can share either. Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 28 14:31:08 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 28 Oct 2020 15:31:08 -0400 Subject: [petsc-users] Regarding changes in the 3.14 release In-Reply-To: References: Message-ID: On Wed, Oct 28, 2020 at 3:13 PM Sajid Ali wrote: > Hi PETSc-developers, > > I have a few questions regarding changes to PETSc between version 3.13.5 > and current master. I?m trying to run an application that worked with no > issues with version 3.13.5 but isn?t working with the current master. > > [1] To assemble a matrix in this application I loop over all rows and have > multiple calls to MatSetValuesStencil with INSERT_VALUES as the addv > argument for all except one call which has ADD_VALUES. Final assembly is > called after this loop. With PETSc-3.13.5 this ran with no errors but with > PETSc-master I get : > > Object is in wrong state > [0]PETSC ERROR: Cannot mix add values and insert values > > This is fixed by having a flush assembly in between two stages where the > first stage has two loops with INSERT_VALUES and the second stage has a > loop with ADD_VALUES. > > Did this change result from a bugfix or are users now expected to no > longer mix add and insert values within the same loop ? > We never checked before. You were never supposed to do that. It can break. > [2] To prevent re-building the preconditioner at all TSSteps, I had the > command line argument -snes_lag_preconditioner -1. This did the job in > 3.13.5 but with the current master I get the following error : > > Cannot set the lag to -1 from the command line since the preconditioner must be built as least once, perhaps you mean -2 > > I can however run the application without this option. If this is a > breaking change, what is the new option to prevent re-building the > preconditioner ? > -1 means never build, but you have not built the preconditioner. Thus you probably want -2 which means build once and then not again. > [3] Finally, I?m used the latest development version of MPICH for building > both 3.13.5 and petsc-master and I get these warnings at exit : > > [WARNING] yaksa: 2 leaked handles > .... (repeated N number of times where N is number of mpi ranks) > > Can this be safely neglected ? > I don't know. Thanks, Matt > Let me know if sharing either the application code and/or logs would be > helpful and I can share either. > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From salazardetro1 at llnl.gov Wed Oct 28 16:54:04 2020 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Wed, 28 Oct 2020 21:54:04 +0000 Subject: [petsc-users] TSAdjoint and adaptive time stepping Message-ID: Hello, I saw in the TSAdjoint paper that adjoints for adaptive time stepping schemes are supported. Given that these schemes usually involve nondifferentiable functions to pick the time step, are the sensitivities also nondifferentiable at certain points? Does one need to be careful when using adjoints with adaptive time steps? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Wed Oct 28 19:56:33 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Wed, 28 Oct 2020 19:56:33 -0500 Subject: [petsc-users] Regarding changes in the 3.14 release In-Reply-To: References: Message-ID: Hi Matt, Thanks for the clarification. The documentation for SNESSetLagPreconditioner states "If -1 is used before the very first nonlinear solve the preconditioner is still built because there is no previous preconditioner to use" which was true prior to 3.14, is this statement no longer valid ? What is the difference between having -snes_lag_preconditioner -2 and having -snes_lag_preconditioner_persists true ? PS : The man pages for SNESSetLagJacobianPersists should perhaps not state the lag preconditioner options database keys and vice versa for clarity. Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Oct 28 20:08:39 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 28 Oct 2020 21:08:39 -0400 Subject: [petsc-users] Regarding changes in the 3.14 release In-Reply-To: References: Message-ID: On Wed, Oct 28, 2020 at 8:57 PM Sajid Ali wrote: > Hi Matt, > > Thanks for the clarification. The documentation > > for SNESSetLagPreconditioner states "If -1 is used before the very first > nonlinear solve the preconditioner is still built because there is no > previous preconditioner to use" which was true prior to 3.14, is this > statement no longer valid ? > Sounds like it is not. Barry? > What is the difference between having -snes_lag_preconditioner -2 and > having -snes_lag_preconditioner_persists true ? > Persists applies to multiple solves, whereas -2 only applies to the current one. Thanks, Matt > PS : The man pages for SNESSetLagJacobianPersists should perhaps not > state the lag preconditioner options database keys and vice versa for > clarity. > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Oct 28 20:10:38 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Thu, 29 Oct 2020 01:10:38 +0000 Subject: [petsc-users] superlu_dist segfault In-Reply-To: References: , Message-ID: Marius, I tested your code with petsc-release on my mac laptop using np=2 cores. I first tested a small matrix data file successfully. Then I switch to your data file and run out of memory, likely due to the dense matrices B and X. I got an error "Your system has run out of application memory" from my laptop. The sparse matrix A has size 42549 by 42549. Your code creates dense matrices B and X with the same size -- a huge memory requirement! By replacing B and X with size 42549 by nrhs (nrhs =< 4000), I had the code run well with np=2. Note the error message you got [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range The modified code I used is attached. Hong ________________________________ From: Marius Buerkle Sent: Tuesday, October 27, 2020 10:01 PM To: Zhang, Hong Cc: petsc-users at mcs.anl.gov ; Sherry Li Subject: Aw: Re: [petsc-users] superlu_dist segfault Hi, I recompiled PETSC with debug option, now I get a seg fault at a different position [23]PETSC ERROR: ------------------------------------------------------------------------ [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [23]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [23]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [23]PETSC ERROR: likely location of problem given in stack below [23]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [23]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [23]PETSC ERROR: INSTEAD the line number of the start of the function [23]PETSC ERROR: is given. [23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c [23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c [23]PETSC ERROR: [23] MatMatSolve line 3466 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c [23]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [23]PETSC ERROR: Signal received I made a small reproducer. The matrix is a bit too big so I cannot attach it directly to the email, but I put it in the cloud https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw Best, Marius Gesendet: Dienstag, 27. Oktober 2020 um 23:11 Uhr Von: "Zhang, Hong" An: "Marius Buerkle" , "petsc-users at mcs.anl.gov" , "Sherry Li" Betreff: Re: [petsc-users] superlu_dist segfault Marius, It fails at the line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex)))) ABORT("Malloc fails for lsum[]."); We do not know what it means. You may use a debugger to check the values of the variables involved. I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone short code that reproduce the error. We can help on its investigation. Hong ________________________________ From: petsc-users on behalf of Marius Buerkle Sent: Tuesday, October 27, 2020 8:46 AM To: petsc-users at mcs.anl.gov Subject: [petsc-users] superlu_dist segfault Hi, When using MatMatSolve with superlu_dist I get a segmentation fault: Malloc fails for lsum[]. at line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c The matrix size is not particular big and I am using the petsc release branch and superlu_dist is v6.3.0 I think. Best, Marius -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: superlu_test.c URL: From hongzhang at anl.gov Wed Oct 28 22:36:15 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Thu, 29 Oct 2020 03:36:15 +0000 Subject: [petsc-users] TSAdjoint and adaptive time stepping In-Reply-To: References: Message-ID: <4CC86760-34FD-4AE5-B936-05E51B71820A@anl.gov> I think it depends on the functional for which the sensitivities are calculated. For most cases, the objective functional should not be sensitive to the step sizes when a converged solution is achieved. What the adapter does is just to choose a step size so that the solution is accurate within certain tolerances. Of course, if the adapter is not doing a good job (e.g. choosing a step size that leads to instability), not only the sensitivities are influenced but also the solution is inaccurate. Hong (Mr.) On Oct 28, 2020, at 4:54 PM, Salazar De Troya, Miguel via petsc-users > wrote: Hello, I saw in the TSAdjoint paper that adjoints for adaptive time stepping schemes are supported. Given that these schemes usually involve nondifferentiable functions to pick the time step, are the sensitivities also nondifferentiable at certain points? Does one need to be careful when using adjoints with adaptive time steps? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Wed Oct 28 23:43:41 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Thu, 29 Oct 2020 05:43:41 +0100 Subject: [petsc-users] superlu_dist segfault In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From xsli at lbl.gov Thu Oct 29 00:14:24 2020 From: xsli at lbl.gov (Xiaoye S. Li) Date: Wed, 28 Oct 2020 22:14:24 -0700 Subject: [petsc-users] superlu_dist segfault In-Reply-To: References: Message-ID: Hong: thanks for the diagnosis! Marius: how many OpenMP threads are you using per MPI task? In an earlier email, you mentioned the allocation failure at the following line: if ( !(lsum = (doublecomplex*) SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex)))) ABORT("Malloc fails for lsum[]."); this is in the solve phase. I think when we do some OpenMP optimization, we allowed several data structures to grow with OpenMP threads. You can try to use 1 thread. The RHS and X memories are easy to compute. However, in order to gauge how much memory is used in the factorization, can you print out the number of nonzeros in the L and U factors? What ordering option are you using? The sparse matrix A looks pretty small. The code can also print out the working storage used during factorization. I am not sure how this printing can be turned on through PETSc. Sherry On Wed, Oct 28, 2020 at 9:43 PM Marius Buerkle wrote: > Thanks for the swift reply. > > I also realized if I reduce the number of RHS then it works. But I am > running the code on a cluster with 256GB ram / node. One dense matrix > would be around ~30 Gb so 60 Gb, which is large but does exceed the > memory of even one node and I also get the seg fault if I run it on several > nodes. Moreover, it works well with MUMPS and MKL_CPARDISO solver. The > maxium memory used when using MUMPS is around 150 Gb during the solver > phase but for SuperLU_dist it crashed even before reaching the solver > phase. Could there be such a large difference in memory usage between > SuperLu_dist and MUMPS ? > > > > best, > > marius > > *Gesendet:* Donnerstag, 29. Oktober 2020 um 10:10 Uhr > *Von:* "Zhang, Hong" > *An:* "Marius Buerkle" > *Cc:* "petsc-users at mcs.anl.gov" , "Sherry Li" < > xiaoye at nersc.gov> > *Betreff:* Re: Re: [petsc-users] superlu_dist segfault > Marius, > I tested your code with petsc-release on my mac laptop using np=2 cores. I > first tested a small matrix data file successfully. Then I switch to your > data file and run out of memory, likely due to the dense matrices B and X. > I got an error "Your system has run out of application memory" from my > laptop. > > The sparse matrix A has size 42549 by 42549. Your code creates dense > matrices B and X with the same size -- a huge memory requirement! > By replacing B and X with size 42549 by nrhs (nrhs =< 4000), I had the > code run well with np=2. Note the error message you got > [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > > The modified code I used is attached. > Hong > > ------------------------------ > *From:* Marius Buerkle > *Sent:* Tuesday, October 27, 2020 10:01 PM > *To:* Zhang, Hong > *Cc:* petsc-users at mcs.anl.gov ; Sherry Li < > xiaoye at nersc.gov> > *Subject:* Aw: Re: [petsc-users] superlu_dist segfault > > Hi, > > I recompiled PETSC with debug option, now I get a seg fault at a different > position > > [23]PETSC ERROR: > ------------------------------------------------------------------------ > [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [23]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [23]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [23]PETSC ERROR: likely location of problem given in stack below > [23]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [23]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [23]PETSC ERROR: INSTEAD the line number of the start of the function > [23]PETSC ERROR: is given. > [23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242 > /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211 > /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [23]PETSC ERROR: [23] MatMatSolve line 3466 > /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c > [23]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [23]PETSC ERROR: Signal received > > I made a small reproducer. The matrix is a bit too big so I cannot attach > it directly to the email, but I put it in the cloud > https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw > > Best, > Marius > > > *Gesendet:* Dienstag, 27. Oktober 2020 um 23:11 Uhr > *Von:* "Zhang, Hong" > *An:* "Marius Buerkle" , "petsc-users at mcs.anl.gov" < > petsc-users at mcs.anl.gov>, "Sherry Li" > *Betreff:* Re: [petsc-users] superlu_dist segfault > Marius, > It fails at the line 1075 in file > /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c > if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * > sizeof(doublecomplex)))) ABORT("Malloc fails for lsum[]."); > > We do not know what it means. You may use a debugger to check the values > of the variables involved. > I'm cc'ing Sherry (superlu_dist developer), or you may send us a > stand-alone short code that reproduce the error. We can help on its > investigation. > Hong > > > ------------------------------ > *From:* petsc-users on behalf of Marius > Buerkle > *Sent:* Tuesday, October 27, 2020 8:46 AM > *To:* petsc-users at mcs.anl.gov > *Subject:* [petsc-users] superlu_dist segfault > > Hi, > > When using MatMatSolve with superlu_dist I get a segmentation fault: > > Malloc fails for lsum[]. at line 1075 in file > /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c > > The matrix size is not particular big and I am using the petsc release > branch and superlu_dist is v6.3.0 I think. > > Best, > Marius > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuerkle at web.de Thu Oct 29 01:04:47 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Thu, 29 Oct 2020 07:04:47 +0100 Subject: [petsc-users] superlu_dist segfault In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From salazardetro1 at llnl.gov Thu Oct 29 11:10:56 2020 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Thu, 29 Oct 2020 16:10:56 +0000 Subject: [petsc-users] TSAdjoint and adaptive time stepping In-Reply-To: <4CC86760-34FD-4AE5-B936-05E51B71820A@anl.gov> References: <4CC86760-34FD-4AE5-B936-05E51B71820A@anl.gov> Message-ID: <0FF12D70-5AA1-4E8F-AE22-D238DFBF6048@llnl.gov> Does this mean that the adjoint method doesn?t take into account the step adapter? Meaning that the adapter is not differentiated with respect to its dependencies (one of them being the solution at each time step). I can imagine that a discrete adjoint method with a step controller should be differentiating the step controller as well. Thanks Miguel From: "Zhang, Hong" Date: Wednesday, October 28, 2020 at 8:36 PM To: "Salazar De Troya, Miguel" Cc: "Guyer, Jonathan E. Dr. (Fed) via petsc-users" Subject: Re: [petsc-users] TSAdjoint and adaptive time stepping I think it depends on the functional for which the sensitivities are calculated. For most cases, the objective functional should not be sensitive to the step sizes when a converged solution is achieved. What the adapter does is just to choose a step size so that the solution is accurate within certain tolerances. Of course, if the adapter is not doing a good job (e.g. choosing a step size that leads to instability), not only the sensitivities are influenced but also the solution is inaccurate. Hong (Mr.) On Oct 28, 2020, at 4:54 PM, Salazar De Troya, Miguel via petsc-users > wrote: Hello, I saw in the TSAdjoint paper that adjoints for adaptive time stepping schemes are supported. Given that these schemes usually involve nondifferentiable functions to pick the time step, are the sensitivities also nondifferentiable at certain points? Does one need to be careful when using adjoints with adaptive time steps? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsu at eoas.ubc.ca Thu Oct 29 14:01:10 2020 From: dsu at eoas.ubc.ca (Su,D.S. Danyang) Date: Thu, 29 Oct 2020 19:01:10 +0000 Subject: [petsc-users] Quite different behaviours of PETSc solver on different clusters Message-ID: <99FC758F-929C-4388-B09E-DC11FCB0004A@eoas.ubc.ca> Dear PETSc users, This is a question bother me for some time. I have the same code running on different clusters and both clusters have good speedup. However, I noticed some thing quite strange. On one cluster, the solver is quite stable in computing time while on another cluster, the solver is unstable in computing time. As shown in the figure below, the local calculation almost has no communication and the computing time in this part is quite stable. However, PETSc solver on Cluster B jumps quite a lot and the performance is not as good as Cluster A, even though the local calculation is a little better on Cluster B. There are some difference on hardware and PETSc configuration and optimization. Cluster A uses OpenMPI + GCC compiler and Cluster B uses MPICH + GCC compiler. The number of processors used is 128 on Cluster A and 120 on Cluster B. I also tested different number of processors but the problem is the same. Does anyone have any idea which part might cause this problem? [cid:image001.png at 01D6ADEB.30817A80] [cid:image002.png at 01D6ADEB.30817A80] Thanks and regards, Danyang -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 103885 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 107143 bytes Desc: image002.png URL: From knepley at gmail.com Thu Oct 29 20:05:53 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 29 Oct 2020 21:05:53 -0400 Subject: [petsc-users] Quite different behaviours of PETSc solver on different clusters In-Reply-To: <99FC758F-929C-4388-B09E-DC11FCB0004A@eoas.ubc.ca> References: <99FC758F-929C-4388-B09E-DC11FCB0004A@eoas.ubc.ca> Message-ID: On Thu, Oct 29, 2020 at 3:04 PM Su,D.S. Danyang wrote: > Dear PETSc users, > > > > This is a question bother me for some time. I have the same code running > on different clusters and both clusters have good speedup. However, I > noticed some thing quite strange. On one cluster, the solver is quite > stable in computing time while on another cluster, the solver is unstable > in computing time. As shown in the figure below, the local calculation > almost has no communication and the computing time in this part is quite > stable. However, PETSc solver on Cluster B jumps quite a lot and the > performance is not as good as Cluster A, even though the local calculation > is a little better on Cluster B. There are some difference on hardware and > PETSc configuration and optimization. Cluster A uses OpenMPI + GCC compiler > and Cluster B uses MPICH + GCC compiler. The number of processors used is > 128 on Cluster A and 120 on Cluster B. I also tested different number of > processors but the problem is the same. Does anyone have any idea which > part might cause this problem? > First question: Does the solver take more iterates when the time bumps up? Thanks, Matt > > > > > Thanks and regards, > > > > Danyang > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 103885 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 107143 bytes Desc: not available URL: From dsu at eoas.ubc.ca Thu Oct 29 20:17:38 2020 From: dsu at eoas.ubc.ca (Danyang Su) Date: Thu, 29 Oct 2020 18:17:38 -0700 Subject: [petsc-users] Quite different behaviours of PETSc solver on different clusters In-Reply-To: References: <99FC758F-929C-4388-B09E-DC11FCB0004A@eoas.ubc.ca> Message-ID: <18AD9C93-7FFC-46DB-BB38-21E2644CA550@eoas.ubc.ca> Hi Matt, No, interations from both linear and nonlinear solvers are similar. The system administrator doubt that the latency in mpich makes the difference. We will test a petsc version with OpenMPI on that cluster to check if it makes difference. Thanks, Danyang On October 29, 2020 6:05:53 p.m. PDT, Matthew Knepley wrote: >On Thu, Oct 29, 2020 at 3:04 PM Su,D.S. Danyang >wrote: > >> Dear PETSc users, >> >> >> >> This is a question bother me for some time. I have the same code >running >> on different clusters and both clusters have good speedup. However, I >> noticed some thing quite strange. On one cluster, the solver is quite >> stable in computing time while on another cluster, the solver is >unstable >> in computing time. As shown in the figure below, the local >calculation >> almost has no communication and the computing time in this part is >quite >> stable. However, PETSc solver on Cluster B jumps quite a lot and the >> performance is not as good as Cluster A, even though the local >calculation >> is a little better on Cluster B. There are some difference on >hardware and >> PETSc configuration and optimization. Cluster A uses OpenMPI + GCC >compiler >> and Cluster B uses MPICH + GCC compiler. The number of processors >used is >> 128 on Cluster A and 120 on Cluster B. I also tested different number >of >> processors but the problem is the same. Does anyone have any idea >which >> part might cause this problem? >> > >First question: Does the solver take more iterates when the time bumps >up? > > Thanks, > > Matt > > >> >> >> >> >> Thanks and regards, >> >> >> >> Danyang >> >> >> > > >-- >What most experimenters take for granted before they begin their >experiments is infinitely more interesting than any results to which >their >experiments lead. >-- Norbert Wiener > >https://www.cse.buffalo.edu/~knepley/ > -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Thu Oct 29 20:28:49 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Thu, 29 Oct 2020 17:28:49 -0800 Subject: [petsc-users] new book introducing PETSc for PDEs Message-ID: All -- SIAM Press just published my new book "PETSc for Partial Differential Equations: Numerical Solutions in C and Python": https://my.siam.org/Store/Product/viewproduct/?ProductId=32850137 The book is available both as a paperback and an e-book with working links. A SIAM member discount is available, of course. This book is a genuine introduction which does not assume you have used PETSc before, and which should make sense even if your differential equations knowledge is basic. The prerequisites are a bit of programming in C and a bit of numerical linear algebra, roughly like the main ideas of Trefethen and Bau, but even that is reviewed and summarized. I've made an effort to introduce discretizations from the beginning, especially finite differences and elements. The book is based on a collection of example programs at https://github.com/bueler/p4pdes. Most of these codes call PETSc directly through the C API, but the last two chapters have Python codes using UFL and Firedrake. Nonetheless the book contains ideas, mathematical and computational; it complements, but does not replace, the PETSc User's Manual and the tutorial examples in the PETSc source. Concepts are explained and illustrated, with sufficient context to facilitate further development. Performance (optimality) and parallel scalability are the primary goals, so preconditioners including multigrid are central threads, and run-time solver options are explored in both the text and the exercises. Here is the place to appreciate the usual PETSc suspects for their comments on drafts, and help in writing this book: Barry, Jed, Matt, Dave, Rich, Lois, Patrick, Mark, Satish, David K., and many others. Also let me say that SIAM Press has nothing but professionals who are nice to work with too; send them your book idea! Ed -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 30 14:07:54 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 30 Oct 2020 14:07:54 -0500 Subject: [petsc-users] superlu_dist segfault In-Reply-To: References: Message-ID: <998AF097-67A5-45DA-9E6F-4201A4BEDDFE@petsc.dev> Have you run it yet with valgrind, good be memory corruption earlier that causes a later crash, crashes that occur at different places for the same run are almost always due to memory corruption. If valgrind is clean you can run with -on_error_attach_debugger and if the X forwarding is set up it will open a debugger on the crashing process and you can type bt to see exactly where it is crashing, at what line number and code line. Barry > On Oct 29, 2020, at 1:04 AM, Marius Buerkle wrote: > > Hi Sherry, > > I used only 1 OpenMP thread and I also recompiled PETSC in debug mode with OpenMP turned off. But did not help. > > Here is the output I can get from SuperLu during the PETSC run > Nonzeros in L 29519630 > Nonzeros in U 29519630 > nonzeros in L+U 58996711 > nonzeros in LSUB 4509612 > ** Memory Usage ********************************** > ** NUMfact space (MB): (sum-of-all-processes) > L\U : 952.18 | Total : 1980.60 > ** Total highmark (MB): > Sum-of-all : 12401.85 | Avg : 387.56 | Max : 387.56 > ************************************************** > ************************************************** > **** Time (seconds) **** > EQUIL time 0.06 > ROWPERM time 1.03 > COLPERM time 1.01 > SYMBFACT time 0.45 > DISTRIBUTE time 0.33 > FACTOR time 0.90 > Factor flops 2.225916e+11 Mflops 247438.62 > SOLVE time 0.000 > ************************************************** > > I tried all available ordering options for Colperm (NATURAL,MMD_AT_PLUS_A,MMD_ATA,METIS_AT_PLUS_A), save for parmetis which always crashes. For Rowperm I used NOROWPERM, LargeDiag_MC64. All gives the same seg. fault. > > > Gesendet: Donnerstag, 29. Oktober 2020 um 14:14 Uhr > Von: "Xiaoye S. Li" > An: "Marius Buerkle" > Cc: "Zhang, Hong" , "petsc-users at mcs.anl.gov" , "Sherry Li" > Betreff: Re: Re: Re: [petsc-users] superlu_dist segfault > Hong: thanks for the diagnosis! > > Marius: how many OpenMP threads are you using per MPI task? > In an earlier email, you mentioned the allocation failure at the following line: > if ( !(lsum = (doublecomplex*) SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex)))) ABORT("Malloc fails for lsum[]."); > > this is in the solve phase. I think when we do some OpenMP optimization, we allowed several data structures to grow with OpenMP threads. You can try to use 1 thread. > > The RHS and X memories are easy to compute. However, in order to gauge how much memory is used in the factorization, can you print out the number of nonzeros in the L and U factors? What ordering option are you using? The sparse matrix A looks pretty small. > > The code can also print out the working storage used during factorization. I am not sure how this printing can be turned on through PETSc. > > Sherry > > On Wed, Oct 28, 2020 at 9:43 PM Marius Buerkle > wrote: > Thanks for the swift reply. > > I also realized if I reduce the number of RHS then it works. But I am running the code on a cluster with 256GB ram / node. One dense matrix would be around ~30 Gb so 60 Gb, which is large but does exceed the memory of even one node and I also get the seg fault if I run it on several nodes. Moreover, it works well with MUMPS and MKL_CPARDISO solver. The maxium memory used when using MUMPS is around 150 Gb during the solver phase but for SuperLU_dist it crashed even before reaching the solver phase. Could there be such a large difference in memory usage between SuperLu_dist and MUMPS ? > > > best, > > marius > > > Gesendet: Donnerstag, 29. Oktober 2020 um 10:10 Uhr > Von: "Zhang, Hong" > > An: "Marius Buerkle" > > Cc: "petsc-users at mcs.anl.gov " >, "Sherry Li" > > Betreff: Re: Re: [petsc-users] superlu_dist segfault > Marius, > I tested your code with petsc-release on my mac laptop using np=2 cores. I first tested a small matrix data file successfully. Then I switch to your data file and run out of memory, likely due to the dense matrices B and X. I got an error "Your system has run out of application memory" from my laptop. > > The sparse matrix A has size 42549 by 42549. Your code creates dense matrices B and X with the same size -- a huge memory requirement! > By replacing B and X with size 42549 by nrhs (nrhs =< 4000), I had the code run well with np=2. Note the error message you got > [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > > The modified code I used is attached. > Hong > > From: Marius Buerkle > > Sent: Tuesday, October 27, 2020 10:01 PM > To: Zhang, Hong > > Cc: petsc-users at mcs.anl.gov >; Sherry Li > > Subject: Aw: Re: [petsc-users] superlu_dist segfault > > Hi, > > I recompiled PETSC with debug option, now I get a seg fault at a different position > > [23]PETSC ERROR: ------------------------------------------------------------------------ > [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [23]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [23]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [23]PETSC ERROR: likely location of problem given in stack below > [23]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [23]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [23]PETSC ERROR: INSTEAD the line number of the start of the function > [23]PETSC ERROR: is given. > [23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [23]PETSC ERROR: [23] MatMatSolve line 3466 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c > [23]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [23]PETSC ERROR: Signal received > > I made a small reproducer. The matrix is a bit too big so I cannot attach it directly to the email, but I put it in the cloud > https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw > > Best, > Marius > > > Gesendet: Dienstag, 27. Oktober 2020 um 23:11 Uhr > Von: "Zhang, Hong" > > An: "Marius Buerkle" >, "petsc-users at mcs.anl.gov " >, "Sherry Li" > > Betreff: Re: [petsc-users] superlu_dist segfault > Marius, > It fails at the line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c > if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex)))) ABORT("Malloc fails for lsum[]."); > > We do not know what it means. You may use a debugger to check the values of the variables involved. > I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone short code that reproduce the error. We can help on its investigation. > Hong > > > From: petsc-users > on behalf of Marius Buerkle > > Sent: Tuesday, October 27, 2020 8:46 AM > To: petsc-users at mcs.anl.gov > > Subject: [petsc-users] superlu_dist segfault > > Hi, > > When using MatMatSolve with superlu_dist I get a segmentation fault: > > Malloc fails for lsum[]. at line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c > > The matrix size is not particular big and I am using the petsc release branch and superlu_dist is v6.3.0 I think. > > Best, > Marius -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Oct 30 14:56:29 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 30 Oct 2020 14:56:29 -0500 Subject: [petsc-users] Regarding changes in the 3.14 release In-Reply-To: References: Message-ID: <7B95D0D0-C162-423E-A465-A663AB6728D1@petsc.dev> > On Oct 28, 2020, at 7:56 PM, Sajid Ali wrote: > > Hi Matt, > > Thanks for the clarification. The documentation for SNESSetLagPreconditioner states "If -1 is used before the very first nonlinear solve the preconditioner is still built because there is no previous preconditioner to use" which was true prior to 3.14, is this statement no longer valid ? This looks like outdated information. We may have been less picky at one point. Will remove. > > What is the difference between having -snes_lag_preconditioner -2 and having -snes_lag_preconditioner_persists true ? -2 -1 persist through more nonlinear solves but if the number is positive a new preconditioner will be built for each zero iteration of the Newton solve. persists means that the recompute (say every 2 iterations) is done across all the solves not each individually. Say the lag is 2. And 2 newton steps are done in the first iteration then. without persistence iter 0 total its 0 of first solve compute preconditioning iter 1 1 do not 2 2 do it 0 3 do 1 4 do not with persistence iter 0 0 e compute preconditioning iter 1 1 do not 2 2 do it 0 3 do not 1 4 do so with persistence it does the mod over the second column the total iterations without persistence it does over the local iteration (the normal way). Barry > > PS : The man pages for SNESSetLagJacobianPersists should perhaps not state the lag preconditioner options database keys and vice versa for clarity. > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Fri Oct 30 22:11:52 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Fri, 30 Oct 2020 20:11:52 -0700 Subject: [petsc-users] Problem with SNESsetFunction() Message-ID: Hello, I have a couple of questions regarding SNESSetFunction usage, when programming in Fortran90. 1) I have the following usage paradigm. call SNESSetFunction(snes,f_non,FormFunction,0,ierr) subroutine FormFunction(snes,x,r,dummy,ierr) In the FormFunction subroutine, the function values are stored in the vector r. I see that these values are formed correctly. But when I use FormFunction in SNESSetFunction(), the values are not getting populated into f_non and all of the values in f_non are zero. Should the name of the variable used to store the function value be same in SNESSetFunction and FormFunction? And should I be calling the SNESComputeFunction() after calling SNESSetFunction()? 2) In the subroutine FormFunction, should the vector objects created be destroyed before ending the subroutine? Please let me know if you need any further information. Thank you in advance. Best regards, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Oct 31 09:14:02 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 31 Oct 2020 09:14:02 -0500 Subject: [petsc-users] Problem with SNESsetFunction() In-Reply-To: References: Message-ID: <18F9C792-0B04-4393-98A2-96A551D5A082@petsc.dev> > On Oct 30, 2020, at 10:11 PM, baikadi pranay wrote: > > Hello, > I have a couple of questions regarding SNESSetFunction usage, when programming in Fortran90. > > 1) I have the following usage paradigm. > call SNESSetFunction(snes,f_non,FormFunction,0,ierr) > subroutine FormFunction(snes,x,r,dummy,ierr) > In the FormFunction subroutine, the function values are stored in the vector r. I see that these values are formed correctly. But when I use FormFunction in SNESSetFunction(), the values are not getting populated into f_non and all of the values in f_non are zero. > Should the name of the variable used to store the function value be same in SNESSetFunction and FormFunction? It does not need to be the same, they are just the variables in each function > And should I be calling the SNESComputeFunction() after calling SNESSetFunction()? No, that is a developer function called in PETSc, one would not need to call that. > > 2) In the subroutine FormFunction, should the vector objects created be destroyed before ending the subroutine? What vectors? If you are creating any work vectors you need within the the FormFunction, yes those should be destroyed. But not the input and output functions. Here is any example from src/snes/tutorials/ex1f.F90 Note you call VecGetArrayF90() to access the arrays for the vectors, put the values into the arrays subroutine FormFunction(snes,X,F,user,ierr) implicit none ! Input/output variables: SNES snes Vec X,F PetscErrorCode ierr type (userctx) user DM da ! Declarations for use with local arrays: PetscScalar,pointer :: lx_v(:),lf_v(:) Vec localX ! Scatter ghost points to local vector, using the 2-step process ! DMGlobalToLocalBegin(), DMGlobalToLocalEnd(). ! By placing code between these two statements, computations can ! be done while messages are in transition. call SNESGetDM(snes,da,ierr);CHKERRQ(ierr) call DMGetLocalVector(da,localX,ierr);CHKERRQ(ierr) call DMGlobalToLocalBegin(da,X,INSERT_VALUES,localX,ierr);CHKERRQ(ierr) call DMGlobalToLocalEnd(da,X,INSERT_VALUES,localX,ierr);CHKERRQ(ierr) ! Get a pointer to vector data. ! - For default PETSc vectors, VecGetArray90() returns a pointer to ! the data array. Otherwise, the routine is implementation dependent. ! - You MUST call VecRestoreArrayF90() when you no longer need access to ! the array. ! - Note that the interface to VecGetArrayF90() differs from VecGetArray(), ! and is useable from Fortran-90 Only. call VecGetArrayF90(localX,lx_v,ierr);CHKERRQ(ierr) call VecGetArrayF90(F,lf_v,ierr);CHKERRQ(ierr) ! Compute function over the locally owned part of the grid call FormFunctionLocal(lx_v,lf_v,user,ierr);CHKERRQ(ierr) ! Restore vectors call VecRestoreArrayF90(localX,lx_v,ierr);CHKERRQ(ierr) call VecRestoreArrayF90(F,lf_v,ierr);CHKERRQ(ierr) ! Insert values into global vector call DMRestoreLocalVector(da,localX,ierr);CHKERRQ(ierr) call PetscLogFlops(11.0d0*user%ym*user%xm,ierr) ! call VecView(X,PETSC_VIEWER_STDOUT_WORLD,ierr) ! call VecView(F,PETSC_VIEWER_STDOUT_WORLD,ierr) return end subroutine formfunction end module f90module > > Please let me know if you need any further information. Thank you in advance. > Best regards, > Pranay. > > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: