From Pierre.Seize at onera.fr  Thu Oct  1 03:18:22 2020
From: Pierre.Seize at onera.fr (Pierre Seize)
Date: Thu, 1 Oct 2020 10:18:22 +0200
Subject: [petsc-users] Memory violation in
 PetscFVLeastSquaresPseudoInverseSVD_Static
In-Reply-To: <875z7vp0th.fsf@jedbrown.org>
References: <d9844930-7ecc-4102-b34d-e5c80e9f09b8@onera.fr>
	<875z7vp0th.fsf@jedbrown.org>
Message-ID: <98d007d2-4721-c19f-9a20-3da62e6d602d@onera.fr>

Sure I'll try,

I was thinking of using ls->B as the A matrix for dgelss, ls->work as 
work and ls->Binv as B. The result is then stored in ls->Binv, but in 
column-major format.

Right now, the column-major result is transposed in 
PetscFVLeastSquaresPseudoInverseSVD_Static, and the row-major result is 
copied in the output in PetscFVComputeGradient_LeastSquares. I think 
it's because PetscFVLeastSquaresPseudoInverse_Static gives the result in 
row-major format.

Would it be alright if I changed 
PetscFVLeastSquaresPseudoInverseSVD_Static so that the result would 
still be in column-major format ? I could include the result recopy in 
the if statement for example.

Moreover, this would be to keep the compatibility with 
PetscFVLeastSquaresPseudoInverse_Static, but right now it is manually 
disabled (with useSVD = PETSC_TRUE), so I am worrying for nothing ?


Pierre


On 30/09/20 20:38, Jed Brown wrote:
> Pierre Seize <Pierre.Seize at onera.fr> writes:
>
>> Hi,
>>
>> In PetscFVLeastSquaresPseudoInverseSVD_Static, there is
>>   ? Brhs = work;
>>   ? maxmn = PetscMax(m,n);
>>   ? for (j=0; j<maxmn; j++) {
>>   ??? for (i=0; i<maxmn; i++) Brhs[i + j*maxmn] = 1.0*(i == j);
>>   ? }
>> where on the calling function, PetscFVComputeGradient_LeastSquares, we
>> set the arguments m <= numFaces, n <= dim and work <= ls->work. The size
>> of the work array is computed in PetscFVLeastSquaresSetMaxFaces_LS as:
>>   ? ls->maxFaces = maxFaces;
>>   ? m?????? = ls->maxFaces;
>>   ? n?????? = dim;
>>   ? nrhs??? = ls->maxFaces;
>>   ? minwork = 3*PetscMin(m,n) + PetscMax(2*PetscMin(m,n),
>> PetscMax(PetscMax(m,n), nrhs)); /* required by LAPACK */
> It's totally buggy because this formula is for the argument to dgelss, but the array is being used for a different purpose (to place Brhs).
>
>             WORK
>
>                       WORK is DOUBLE PRECISION array, dimension (MAX(1,LWORK))
>                       On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
>
>             LWORK
>
>                       LWORK is INTEGER
>                       The dimension of the array WORK. LWORK >= 1, and also:
>                       LWORK >= 3*min(M,N) + max( 2*min(M,N), max(M,N), NRHS )
>                       For good performance, LWORK should generally be larger.
>
>                       If LWORK = -1, then a workspace query is assumed; the routine
>                       only calculates the optimal size of the WORK array, returns
>                       this value as the first entry of the WORK array, and no error
>                       message related to LWORK is issued by XERBLA.
>
> There should be a separate allocation for Brhs and the work argument should be passed through to dgelss.
>
> The current code passes
>
>    tmpwork = Ainv;
>
> along to dgelss, but we don't know that it's the right size either.
>
>
> Would you be willing to submit a merge request with your best attempt at fixing this.  I can help review and we'll get it into the 3.14.1 release?
>
>>   ? ls->workSize = 5*minwork; /* We can afford to be extra generous */
>>
>> In my example, the used size (maxmn * maxmn) is 81, and the actual size
>> (ls->workSize) is 75, and therefore valgrind complains.
>> Is it because I am missing something, or is it a bug ?
>>
>> Thanks
>>
>> Pierre Seize


From t.appel17 at imperial.ac.uk  Thu Oct  1 03:30:38 2020
From: t.appel17 at imperial.ac.uk (Appel, Thibaut)
Date: Thu, 1 Oct 2020 08:30:38 +0000
Subject: [petsc-users] reset release branch
In-Reply-To: <mailman.18265.1601485090.2425.petsc-users@mcs.anl.gov>
References: <mailman.18265.1601485090.2425.petsc-users@mcs.anl.gov>
Message-ID: <DC8D13E5-1E94-4CA2-8E41-EC931B4E030B@ic.ac.uk>

Is ?master? still considered stable?

Thibaut

> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 30 Sep 2020 09:14:41 -0500 (CDT)
> From: Satish Balay <balay at mcs.anl.gov>
> To: petsc-dev at mcs.anl.gov
> Cc: petsc-users at mcs.anl.gov
> Subject: [petsc-users] reset release branch
> Message-ID: <alpine.LFD.2.23.451.2009300902130.2949 at sb>
> Content-Type: text/plain; charset=US-ASCII
> 
> All,
> 
> I had to force fix the release branch due to a bad merge.
> 
> If you've pulled on the release branch after the bad merge (before this fix) - and now have the commit 25cac2be9df307cc6f0df502d8399122c3a2b6a3 in it - i.e check with:
> 
> git branch --contains 25cac2be9df307cc6f0df502d8399122c3a2b6a3
> 
> please do:
> 
> git checkout master
> git branch -D release
> git fetch -p
> git checkout release
> 
> [Note: As the petsc-3.14 release announcement e-mail indicated - we switched from using 'maint' branch to 'release' branch or release fixes]
> 
> Satish


From knepley at gmail.com  Thu Oct  1 06:37:16 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 1 Oct 2020 07:37:16 -0400
Subject: [petsc-users] reset release branch
In-Reply-To: <DC8D13E5-1E94-4CA2-8E41-EC931B4E030B@ic.ac.uk>
References: <mailman.18265.1601485090.2425.petsc-users@mcs.anl.gov>
	<DC8D13E5-1E94-4CA2-8E41-EC931B4E030B@ic.ac.uk>
Message-ID: <CAMYG4Gn5A6hcX_H8ZDj9oubu7bupAEt3XcusBcaqh=KmT0O3cA@mail.gmail.com>

On Thu, Oct 1, 2020 at 4:30 AM Appel, Thibaut <t.appel17 at imperial.ac.uk>
wrote:

> Is ?master? still considered stable?
>

Yes. Note however that we are going to migrate that branch to the name
'main' after this release.

  Thanks,

    Matt


> Thibaut
>
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 30 Sep 2020 09:14:41 -0500 (CDT)
> > From: Satish Balay <balay at mcs.anl.gov>
> > To: petsc-dev at mcs.anl.gov
> > Cc: petsc-users at mcs.anl.gov
> > Subject: [petsc-users] reset release branch
> > Message-ID: <alpine.LFD.2.23.451.2009300902130.2949 at sb>
> > Content-Type: text/plain; charset=US-ASCII
> >
> > All,
> >
> > I had to force fix the release branch due to a bad merge.
> >
> > If you've pulled on the release branch after the bad merge (before this
> fix) - and now have the commit 25cac2be9df307cc6f0df502d8399122c3a2b6a3 in
> it - i.e check with:
> >
> > git branch --contains 25cac2be9df307cc6f0df502d8399122c3a2b6a3
> >
> > please do:
> >
> > git checkout master
> > git branch -D release
> > git fetch -p
> > git checkout release
> >
> > [Note: As the petsc-3.14 release announcement e-mail indicated - we
> switched from using 'maint' branch to 'release' branch or release fixes]
> >
> > Satish
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201001/46ee0cc9/attachment.html>

From knepley at gmail.com  Thu Oct  1 06:43:34 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 1 Oct 2020 07:43:34 -0400
Subject: [petsc-users] Memory violation in
 PetscFVLeastSquaresPseudoInverseSVD_Static
In-Reply-To: <98d007d2-4721-c19f-9a20-3da62e6d602d@onera.fr>
References: <d9844930-7ecc-4102-b34d-e5c80e9f09b8@onera.fr>
	<875z7vp0th.fsf@jedbrown.org>
	<98d007d2-4721-c19f-9a20-3da62e6d602d@onera.fr>
Message-ID: <CAMYG4Gk9OF-gofdoydAxHmNSYE7Wfosn2g-=JQinR1ysP8gvjQ@mail.gmail.com>

On Thu, Oct 1, 2020 at 4:18 AM Pierre Seize <Pierre.Seize at onera.fr> wrote:

> Sure I'll try,
>
> I was thinking of using ls->B as the A matrix for dgelss, ls->work as
> work and ls->Binv as B. The result is then stored in ls->Binv, but in
> column-major format.
>
> Right now, the column-major result is transposed in
> PetscFVLeastSquaresPseudoInverseSVD_Static, and the row-major result is
> copied in the output in PetscFVComputeGradient_LeastSquares. I think
> it's because PetscFVLeastSquaresPseudoInverse_Static gives the result in
> row-major format.
>
> Would it be alright if I changed
> PetscFVLeastSquaresPseudoInverseSVD_Static so that the result would
> still be in column-major format ? I could include the result recopy in
> the if statement for example.
>
> Moreover, this would be to keep the compatibility with
> PetscFVLeastSquaresPseudoInverse_Static, but right now it is manually
> disabled (with useSVD = PETSC_TRUE), so I am worrying for nothing ?
>

As long as the layout is documented in the function manpage, I think that
change is fine. Nothing else uses
the code except the reconstruction right now.

  Thanks,

    Matt


> Pierre
>
>
> On 30/09/20 20:38, Jed Brown wrote:
> > Pierre Seize <Pierre.Seize at onera.fr> writes:
> >
> >> Hi,
> >>
> >> In PetscFVLeastSquaresPseudoInverseSVD_Static, there is
> >>     Brhs = work;
> >>     maxmn = PetscMax(m,n);
> >>     for (j=0; j<maxmn; j++) {
> >>       for (i=0; i<maxmn; i++) Brhs[i + j*maxmn] = 1.0*(i == j);
> >>     }
> >> where on the calling function, PetscFVComputeGradient_LeastSquares, we
> >> set the arguments m <= numFaces, n <= dim and work <= ls->work. The size
> >> of the work array is computed in PetscFVLeastSquaresSetMaxFaces_LS as:
> >>     ls->maxFaces = maxFaces;
> >>     m       = ls->maxFaces;
> >>     n       = dim;
> >>     nrhs    = ls->maxFaces;
> >>     minwork = 3*PetscMin(m,n) + PetscMax(2*PetscMin(m,n),
> >> PetscMax(PetscMax(m,n), nrhs)); /* required by LAPACK */
> > It's totally buggy because this formula is for the argument to dgelss,
> but the array is being used for a different purpose (to place Brhs).
> >
> >             WORK
> >
> >                       WORK is DOUBLE PRECISION array, dimension
> (MAX(1,LWORK))
> >                       On exit, if INFO = 0, WORK(1) returns the optimal
> LWORK.
> >
> >             LWORK
> >
> >                       LWORK is INTEGER
> >                       The dimension of the array WORK. LWORK >= 1, and
> also:
> >                       LWORK >= 3*min(M,N) + max( 2*min(M,N), max(M,N),
> NRHS )
> >                       For good performance, LWORK should generally be
> larger.
> >
> >                       If LWORK = -1, then a workspace query is assumed;
> the routine
> >                       only calculates the optimal size of the WORK
> array, returns
> >                       this value as the first entry of the WORK array,
> and no error
> >                       message related to LWORK is issued by XERBLA.
> >
> > There should be a separate allocation for Brhs and the work argument
> should be passed through to dgelss.
> >
> > The current code passes
> >
> >    tmpwork = Ainv;
> >
> > along to dgelss, but we don't know that it's the right size either.
> >
> >
> > Would you be willing to submit a merge request with your best attempt at
> fixing this.  I can help review and we'll get it into the 3.14.1 release?
> >
> >>     ls->workSize = 5*minwork; /* We can afford to be extra generous */
> >>
> >> In my example, the used size (maxmn * maxmn) is 81, and the actual size
> >> (ls->workSize) is 75, and therefore valgrind complains.
> >> Is it because I am missing something, or is it a bug ?
> >>
> >> Thanks
> >>
> >> Pierre Seize
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201001/4ca5d326/attachment.html>

From balay at mcs.anl.gov  Thu Oct  1 10:02:34 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 1 Oct 2020 10:02:34 -0500 (CDT)
Subject: [petsc-users] reset release branch
In-Reply-To: <CAMYG4Gn5A6hcX_H8ZDj9oubu7bupAEt3XcusBcaqh=KmT0O3cA@mail.gmail.com>
References: <mailman.18265.1601485090.2425.petsc-users@mcs.anl.gov>
	<DC8D13E5-1E94-4CA2-8E41-EC931B4E030B@ic.ac.uk>
	<CAMYG4Gn5A6hcX_H8ZDj9oubu7bupAEt3XcusBcaqh=KmT0O3cA@mail.gmail.com>
Message-ID: <alpine.LFD.2.23.451.2010010950150.3162@sb>

On Thu, 1 Oct 2020, Matthew Knepley wrote:

> On Thu, Oct 1, 2020 at 4:30 AM Appel, Thibaut <t.appel17 at imperial.ac.uk>
> wrote:
> 
> > Is ?master? still considered stable?
> >
> 
> Yes. Note however that we are going to migrate that branch to the name
> 'main' after this release.

I'm not sure 'stable' is the appropriate description here.  Ok - checking 'man gitworkflows'

>>>>>>>
       ?   maint tracks the commits that should go into the next "maintenance release", i.e., update of the last released stable version;

       ?   master tracks the commits that should go into the next release;

       ?   next is intended as a testing branch for topics being tested for stability for master.
<<<<<<
Ok - stable-release and stable-development?


BTW: the name 'main' is still under discussion. One issue is - its too close to the old 'maint' name. [Currently inclined  to preserve
the old maint* branches for some time - in case its part of workflow of prior releases of applications that can't be changed. This is one reason why bitbucket repo is still active].

Maybe once we switch over - we will not have sticky fingers with maint..

Alternative is 'develop' [but this I believe might suggest a different workflow than what we use: its the above - without next, with maint renamed as release, master renamed as ????]

Satish

From bsmith at petsc.dev  Thu Oct  1 10:06:45 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 1 Oct 2020 10:06:45 -0500
Subject: [petsc-users] reset release branch
In-Reply-To: <DC8D13E5-1E94-4CA2-8E41-EC931B4E030B@ic.ac.uk>
References: <mailman.18265.1601485090.2425.petsc-users@mcs.anl.gov>
	<DC8D13E5-1E94-4CA2-8E41-EC931B4E030B@ic.ac.uk>
Message-ID: <4FE7AD86-0D5D-4CB5-8743-CAEDF351B9D9@petsc.dev>


  Thibaut,

  master has not changed in any way, it's usage is the same as before the release. Only the maint branch has been renamed to release.

   Barry


> On Oct 1, 2020, at 3:30 AM, Appel, Thibaut <t.appel17 at imperial.ac.uk> wrote:
> 
> Is ?master? still considered stable?
> 
> Thibaut
> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Wed, 30 Sep 2020 09:14:41 -0500 (CDT)
>> From: Satish Balay <balay at mcs.anl.gov>
>> To: petsc-dev at mcs.anl.gov
>> Cc: petsc-users at mcs.anl.gov
>> Subject: [petsc-users] reset release branch
>> Message-ID: <alpine.LFD.2.23.451.2009300902130.2949 at sb>
>> Content-Type: text/plain; charset=US-ASCII
>> 
>> All,
>> 
>> I had to force fix the release branch due to a bad merge.
>> 
>> If you've pulled on the release branch after the bad merge (before this fix) - and now have the commit 25cac2be9df307cc6f0df502d8399122c3a2b6a3 in it - i.e check with:
>> 
>> git branch --contains 25cac2be9df307cc6f0df502d8399122c3a2b6a3
>> 
>> please do:
>> 
>> git checkout master
>> git branch -D release
>> git fetch -p
>> git checkout release
>> 
>> [Note: As the petsc-3.14 release announcement e-mail indicated - we switched from using 'maint' branch to 'release' branch or release fixes]
>> 
>> Satish
> 


From olivier.jamond at cea.fr  Thu Oct  1 12:31:23 2020
From: olivier.jamond at cea.fr (Olivier Jamond)
Date: Thu, 1 Oct 2020 19:31:23 +0200
Subject: [petsc-users] Ainsworth formula to solve saddle point problems /
 preconditioner for shell matrices
Message-ID: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>

Dear all,

I am working on a finite-elements/finite-volumes code, whose distributed 
solver is based on petsc. For FE, it relies on Lagrange multipliers for 
the imposition of various boundary conditions or interactions (simple 
dirichlet, contact, ...). This results in saddle point problems:

[S Ct][x]=[f]
[C 0 ][y] [g]

As discussed in this mailing list ("Saddle point problem with nested 
matrix and a relatively small number of Lagrange multipliers"), the 
fieldsplit/PC_COMPOSITE_SCHUR approach involves (2 + 'number of 
iterations of the KSP for the Schur complement') KSPSolve(S, Sp). I 
would like to try the formula given by Ainsworth in [1] to solve this 
problem:

x = (Sp)^(-1) * fp
y = Rt * (f - S*x)

where:
Sp= Ct*C?+ Qt*S*Q
Q = I - P
P = R * C
R =?Ct * (C*Ct)^(-1)

My input matrices (S and C) are MPIAIJ matrices. I create a shell matrix 
for Sp (because it involves?(C*Ct)^(-1) so I think it may be a bad idea 
to compute it explicitly...) with the MatMult?operator to use it in a 
KSPSolve. The C matrix and g vector are scaled so that the condition 
number of Sp is similar to the one of S.

It works, but my main problem is that because Sp is a shell matrix, as 
far as I understand, I deprive myself of all the petsc 
preconditioners... I tried to use S as a preconditioning?matrix, but 
it's not good: With a GAMG?preconditioner, my iteration number is about 
4 times higher than?in a "debug" version where I compute Sp explicitly 
as a MPIAIJ matrix and use it as preconditioning?matrix.

Is there a way to use the petsc preconditioners for?shell matrices or at 
least to define a shell preconditioner that internally calls the petsc 
preconditioners?

In the end I would like to have something like GAMG(Ct*C?+ Qt*S*Q) as a 
preconditioner?(here Q is a shell matrix), or something 
like?Qt*GAMG(S)*Q (which from matlab experimentation could be a 
good?preconditioner).

Many thanks,
Olivier

[1]: Ainsworth, M. (2001). Essential boundary conditions and multi-point 
constraints in finite element analysis.?Computer Methods in Applied 
Mechanics and Engineering,?190(48), 6323-6339.


From jed at jedbrown.org  Thu Oct  1 13:47:10 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 01 Oct 2020 12:47:10 -0600
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
Message-ID: <87mu15u6kx.fsf@jedbrown.org>

Olivier Jamond <olivier.jamond at cea.fr> writes:

> Dear all,
>
> I am working on a finite-elements/finite-volumes code, whose distributed 
> solver is based on petsc. For FE, it relies on Lagrange multipliers for 
> the imposition of various boundary conditions or interactions (simple 
> dirichlet, contact, ...). This results in saddle point problems:
>
> [S Ct][x]=[f]
> [C 0 ][y] [g]
>
> As discussed in this mailing list ("Saddle point problem with nested 
> matrix and a relatively small number of Lagrange multipliers"), the 
> fieldsplit/PC_COMPOSITE_SCHUR approach involves (2 + 'number of 
> iterations of the KSP for the Schur complement') KSPSolve(S, Sp). I 
> would like to try the formula given by Ainsworth in [1] to solve this 
> problem:
>
> x = (Sp)^(-1) * fp
> y = Rt * (f - S*x)
>
> where:
> Sp= Ct*C?+ Qt*S*Q

I just want to observe here that Ct*C lives in the big space and is low rank.  It's kinda like what you would get from an augmented Lagrangian approach.

The second term involves these commutators that destroy sparsity in general, but the context of the paper (as I interpreted it in a quick skim) is such that C*Ct consists of small decoupled blocks associated with each MPC.  The suggestion is that these can either be computed explicitly (possibly at the element level) or cleaned up in a small number of Krylov iterations.

> Q = I - P
> P = R * C
> R =?Ct * (C*Ct)^(-1)
>
> My input matrices (S and C) are MPIAIJ matrices. I create a shell matrix 
> for Sp (because it involves?(C*Ct)^(-1) so I think it may be a bad idea 
> to compute it explicitly...) with the MatMult?operator to use it in a 
> KSPSolve. The C matrix and g vector are scaled so that the condition 
> number of Sp is similar to the one of S.
>
> It works, but my main problem is that because Sp is a shell matrix, as 
> far as I understand, I deprive myself of all the petsc 
> preconditioners... I tried to use S as a preconditioning?matrix, but 
> it's not good: With a GAMG?preconditioner, my iteration number is about 
> 4 times higher than?in a "debug" version where I compute Sp explicitly 
> as a MPIAIJ matrix and use it as preconditioning?matrix.

Are your coupling constraints nonlocal, such that C*Ct is not block diagonal?

> Is there a way to use the petsc preconditioners for?shell matrices or at 
> least to define a shell preconditioner that internally calls the petsc 
> preconditioners?
>
> In the end I would like to have something like GAMG(Ct*C?+ Qt*S*Q) as a 
> preconditioner?(here Q is a shell matrix), or something 
> like?Qt*GAMG(S)*Q (which from matlab experimentation could be a 
> good?preconditioner).
>
> Many thanks,
> Olivier
>
> [1]: Ainsworth, M. (2001). Essential boundary conditions and multi-point 
> constraints in finite element analysis.?Computer Methods in Applied 
> Mechanics and Engineering,?190(48), 6323-6339.

From olivier.jamond at cea.fr  Fri Oct  2 06:50:46 2020
From: olivier.jamond at cea.fr (Olivier Jamond)
Date: Fri, 2 Oct 2020 13:50:46 +0200
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <87mu15u6kx.fsf@jedbrown.org>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
Message-ID: <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>

Dear Jed,

The code I am working on is quite generic and at the solve step, the 
matrix C can be 'whatever' (but is supposed to be full rank). But in 
practice, in 99% of the cases, C contain MPCs that refers to boundary 
conditions applied to subsets of the mesh boundary. These MPCs can 
couple several dofs, and a given dofs can be involved in several MPCs. 
For example, one could impose that the average of the solution in the 
x-direction is null on a part of the boundary, and that this part of the 
boundary is in contact with another part of the boundary.

So yes, CCt is block diagonal, where each block is a set of MPCs that 
share dofs, and CtC is also block diagonal, where each block is a set of 
dofs that share MPCs. For the vast majority of cases, these blocks 
involve dofs/MPCs attached to a subset of the boundary, so they are 
small with respect to the total number of dofs (and their size grows 
slower than the total number of dofs when the mesh is refined).

I am not sure to understand what you mean by compute the MPCs 
explicitly: do you mean eliminating them? For very simple dirichlet 
conditions I see how to do that, but in a more generic case I don't see 
(but there may be some techniques I don't know about!).

I don't understand also what you mean by cleaning them in a small number 
of krylov iterations?

Many thanks,
Olivier

On 01/10/2020 20:47, Jed Brown wrote:
> Olivier Jamond <olivier.jamond at cea.fr> writes:
>
>> Dear all,
>>
>> I am working on a finite-elements/finite-volumes code, whose distributed
>> solver is based on petsc. For FE, it relies on Lagrange multipliers for
>> the imposition of various boundary conditions or interactions (simple
>> dirichlet, contact, ...). This results in saddle point problems:
>>
>> [S Ct][x]=[f]
>> [C 0 ][y] [g]
>>
>> As discussed in this mailing list ("Saddle point problem with nested
>> matrix and a relatively small number of Lagrange multipliers"), the
>> fieldsplit/PC_COMPOSITE_SCHUR approach involves (2 + 'number of
>> iterations of the KSP for the Schur complement') KSPSolve(S, Sp). I
>> would like to try the formula given by Ainsworth in [1] to solve this
>> problem:
>>
>> x = (Sp)^(-1) * fp
>> y = Rt * (f - S*x)
>>
>> where:
>> Sp= Ct*C?+ Qt*S*Q
> I just want to observe here that Ct*C lives in the big space and is low rank.  It's kinda like what you would get from an augmented Lagrangian approach.
>
> The second term involves these commutators that destroy sparsity in general, but the context of the paper (as I interpreted it in a quick skim) is such that C*Ct consists of small decoupled blocks associated with each MPC.  The suggestion is that these can either be computed explicitly (possibly at the element level) or cleaned up in a small number of Krylov iterations.
>
>> Q = I - P
>> P = R * C
>> R =?Ct * (C*Ct)^(-1)
>>
>> My input matrices (S and C) are MPIAIJ matrices. I create a shell matrix
>> for Sp (because it involves?(C*Ct)^(-1) so I think it may be a bad idea
>> to compute it explicitly...) with the MatMult?operator to use it in a
>> KSPSolve. The C matrix and g vector are scaled so that the condition
>> number of Sp is similar to the one of S.
>>
>> It works, but my main problem is that because Sp is a shell matrix, as
>> far as I understand, I deprive myself of all the petsc
>> preconditioners... I tried to use S as a preconditioning?matrix, but
>> it's not good: With a GAMG?preconditioner, my iteration number is about
>> 4 times higher than?in a "debug" version where I compute Sp explicitly
>> as a MPIAIJ matrix and use it as preconditioning?matrix.
> Are your coupling constraints nonlocal, such that C*Ct is not block diagonal?
>
>> Is there a way to use the petsc preconditioners for?shell matrices or at
>> least to define a shell preconditioner that internally calls the petsc
>> preconditioners?
>>
>> In the end I would like to have something like GAMG(Ct*C?+ Qt*S*Q) as a
>> preconditioner?(here Q is a shell matrix), or something
>> like?Qt*GAMG(S)*Q (which from matlab experimentation could be a
>> good?preconditioner).
>>
>> Many thanks,
>> Olivier
>>
>> [1]: Ainsworth, M. (2001). Essential boundary conditions and multi-point
>> constraints in finite element analysis.?Computer Methods in Applied
>> Mechanics and Engineering,?190(48), 6323-6339.

From bsmith at petsc.dev  Fri Oct  2 17:23:57 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 2 Oct 2020 17:23:57 -0500
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
Message-ID: <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>


> On Oct 2, 2020, at 6:50 AM, Olivier Jamond <olivier.jamond at cea.fr> wrote:
> 
> Dear Jed,
> 
> The code I am working on is quite generic and at the solve step, the matrix C can be 'whatever' (but is supposed to be full rank). But in practice, in 99% of the cases, C contain MPCs that refers to boundary conditions applied to subsets of the mesh boundary. These MPCs can couple several dofs, and a given dofs can be involved in several MPCs. For example, one could impose that the average of the solution in the x-direction is null on a part of the boundary, and that this part of the boundary is in contact with another part of the boundary.
> 
> So yes, CCt is block diagonal, where each block is a set of MPCs that share dofs, and CtC is also block diagonal, where each block is a set of dofs that share MPCs. For the vast majority of cases, these blocks involve dofs/MPCs attached to a subset of the boundary, so they are small with respect to the total number of dofs (and their size grows slower than the total number of dofs when the mesh is refined).
> 
> I am not sure to understand what you mean by compute the MPCs explicitly: do you mean eliminating them? For very simple dirichlet conditions I see how to do that, but in a more generic case I don't see (but there may be some techniques I don't know about!).
> 
> I don't understand also what you mean by cleaning them in a small number of krylov iterations?

   I think what Jed is saying is that you should just actually build your preconditioner for your Ct*C + Qt*S*Q operator with S. Because Ct is tall and skinny the eigenstructure of Ct*C + Qt*S*Q is just the eigenstructure of S with a low rank "modification" and Krylov methods (GMRES) are good at solving problems where the eigenstructure of the preconditioner is only a small rank modification of the eigenstructure of the operator you are supply to GMRES. In the best situation each new iteration of GMRES corrects one more of the "rogue" eigen directions. I would first use a direct solver with S just to test how well it works as a preconditioner and then switch to GAMG or whatever should work efficiently for solving your particular S matrix.

  I'd be interested in hearing how well the Ainsworth Formula works, it is something that might be worth adding to PCFIELDSPLIT.


  Barry

> 
> Many thanks,
> Olivier
> 
> On 01/10/2020 20:47, Jed Brown wrote:
>> Olivier Jamond <olivier.jamond at cea.fr> writes:
>> 
>>> Dear all,
>>> 
>>> I am working on a finite-elements/finite-volumes code, whose distributed
>>> solver is based on petsc. For FE, it relies on Lagrange multipliers for
>>> the imposition of various boundary conditions or interactions (simple
>>> dirichlet, contact, ...). This results in saddle point problems:
>>> 
>>> [S Ct][x]=[f]
>>> [C 0 ][y] [g]
>>> 
>>> As discussed in this mailing list ("Saddle point problem with nested
>>> matrix and a relatively small number of Lagrange multipliers"), the
>>> fieldsplit/PC_COMPOSITE_SCHUR approach involves (2 + 'number of
>>> iterations of the KSP for the Schur complement') KSPSolve(S, Sp). I
>>> would like to try the formula given by Ainsworth in [1] to solve this
>>> problem:
>>> 
>>> x = (Sp)^(-1) * fp
>>> y = Rt * (f - S*x)
>>> 
>>> where:
>>> Sp= Ct*C + Qt*S*Q
>> I just want to observe here that Ct*C lives in the big space and is low rank.  It's kinda like what you would get from an augmented Lagrangian approach.
>> 
>> The second term involves these commutators that destroy sparsity in general, but the context of the paper (as I interpreted it in a quick skim) is such that C*Ct consists of small decoupled blocks associated with each MPC.  The suggestion is that these can either be computed explicitly (possibly at the element level) or cleaned up in a small number of Krylov iterations.
>> 
>>> Q = I - P
>>> P = R * C
>>> R = Ct * (C*Ct)^(-1)
>>> 
>>> My input matrices (S and C) are MPIAIJ matrices. I create a shell matrix
>>> for Sp (because it involves (C*Ct)^(-1) so I think it may be a bad idea
>>> to compute it explicitly...) with the MatMult operator to use it in a
>>> KSPSolve. The C matrix and g vector are scaled so that the condition
>>> number of Sp is similar to the one of S.
>>> 
>>> It works, but my main problem is that because Sp is a shell matrix, as
>>> far as I understand, I deprive myself of all the petsc
>>> preconditioners... I tried to use S as a preconditioning matrix, but
>>> it's not good: With a GAMG preconditioner, my iteration number is about
>>> 4 times higher than in a "debug" version where I compute Sp explicitly
>>> as a MPIAIJ matrix and use it as preconditioning matrix.
>> Are your coupling constraints nonlocal, such that C*Ct is not block diagonal?
>> 
>>> Is there a way to use the petsc preconditioners for shell matrices or at
>>> least to define a shell preconditioner that internally calls the petsc
>>> preconditioners?
>>> 
>>> In the end I would like to have something like GAMG(Ct*C + Qt*S*Q) as a
>>> preconditioner (here Q is a shell matrix), or something
>>> like Qt*GAMG(S)*Q (which from matlab experimentation could be a
>>> good preconditioner).
>>> 
>>> Many thanks,
>>> Olivier
>>> 
>>> [1]: Ainsworth, M. (2001). Essential boundary conditions and multi-point
>>> constraints in finite element analysis. Computer Methods in Applied
>>> Mechanics and Engineering, 190(48), 6323-6339.


From ashish.patel at onscale.com  Fri Oct  2 18:15:57 2020
From: ashish.patel at onscale.com (Ashish Patel)
Date: Fri, 2 Oct 2020 16:15:57 -0700
Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM
Message-ID: <CAASqdaUGDEcAQLMEWZ8Cyqaq-_Y8h9pOOUcxpKbVQtOdCaxtaA@mail.gmail.com>

Dear PETSc users,

I am trying to assemble a matrix for a finite element problem where the
degree of freedom (dof) on a surface is constrained via a reference node
which also exists in the DM but is not connected  with any other point in
the mesh. To apply the constraint I want to be able to set matrix values in
the rows belonging to dofs of reference nodes and columns belonging to dofs
of surface nodes and vice versa. But since the two points are not connected
topologically I cannot just use DMPlexMatSetClosure to do that. I am
currently trying to use DMPlexAddConeSize on all the constrained surface
points followed by a call to DMPlexInsertCone wherein I add the reference
node to the cone of surface point before setting up the PetscSection. Is
this the right approach? I am currently getting following message

New nonzero at (8952,28311) caused a malloc
Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
off this check

I could set the suggested option to get rid of the error but was wondering
if I am missing something.

Thanks
Ashish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201002/8f13ef2a/attachment.html>

From Zane.Jakobs at colorado.edu  Sat Oct  3 10:11:21 2020
From: Zane.Jakobs at colorado.edu (Zane Charles Jakobs)
Date: Sat, 3 Oct 2020 08:11:21 -0700
Subject: [petsc-users] Debug build fails
Message-ID: <CABTd2fx16iSxNGBTLgbex=AXpMOTOg47VaC5FyWq4E_C3CJm7A@mail.gmail.com>

Hi PETSc devs,

I just pulled the latest version of PETSc on master, and while my optimized
build works fine, my debug build fails with the message

make[1]: *** No rule to make target 'src/sys/logging/examples/makefile'.
Stop.
make: *** [GNUmakefile:17: src/sys/logging/examples/makefile] Error 2

Doing
ls src/sys/logging/examples

shows a file named `index.html` and a directory named `tutorials`, but no
makefile. My configure line is

./configure PETSC_ARCH=arch-linux-c-debug --with-cc=clang
--with-cxx=clang++ COPTFLAGS="-O3 -march=native -mtune=native -fPIC"
CXXOPTFLAGS="-O3 -march=native -mtune=native -fPIC" FOPTFLAGS="-O3
-march=native -mtune=native -fPIC" --with-avx2=1 --download-mpich
--download-hypre --download-scalapack --download-mumps --with-debugging=yes
--with-blaslapack-dir=/opt/intel/mkl --download-zlib --download-libpng
--download-giflib --download-libjpeg --download-slepc --download-eigen

To reiterate, doing the exact same configure, but changing
'-with-debugging=yes' to '-with-debugging=no' (and changing the PETSC_ARCH
name to 'arch-linux-c-debug') and then building the non-debugging version
of PETSc works as normal. Any ideas what could be going on?

Thanks!

-Zane Jakobs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201003/3357c3fe/attachment.html>

From knepley at gmail.com  Sat Oct  3 10:27:51 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 3 Oct 2020 11:27:51 -0400
Subject: [petsc-users] Debug build fails
In-Reply-To: <CABTd2fx16iSxNGBTLgbex=AXpMOTOg47VaC5FyWq4E_C3CJm7A@mail.gmail.com>
References: <CABTd2fx16iSxNGBTLgbex=AXpMOTOg47VaC5FyWq4E_C3CJm7A@mail.gmail.com>
Message-ID: <CAMYG4GmdfMMN_a5Q-m8wvSOTuuP+guwZ8uRFpTQpjp=C6_U1aA@mail.gmail.com>

On Sat, Oct 3, 2020 at 11:11 AM Zane Charles Jakobs <
Zane.Jakobs at colorado.edu> wrote:

> Hi PETSc devs,
>
> I just pulled the latest version of PETSc on master, and while my
> optimized build works fine, my debug build fails with the message
>
> make[1]: *** No rule to make target 'src/sys/logging/examples/makefile'.
> Stop.
> make: *** [GNUmakefile:17: src/sys/logging/examples/makefile] Error 2
>
> Doing
> ls src/sys/logging/examples
>

Remove this directory. It is not in the repository, and its presence is
confusing the automatic detection for the build.

  Thanks,

     Matt


> shows a file named `index.html` and a directory named `tutorials`, but no
> makefile. My configure line is
>
> ./configure PETSC_ARCH=arch-linux-c-debug --with-cc=clang
> --with-cxx=clang++ COPTFLAGS="-O3 -march=native -mtune=native -fPIC"
> CXXOPTFLAGS="-O3 -march=native -mtune=native -fPIC" FOPTFLAGS="-O3
> -march=native -mtune=native -fPIC" --with-avx2=1 --download-mpich
> --download-hypre --download-scalapack --download-mumps --with-debugging=yes
> --with-blaslapack-dir=/opt/intel/mkl --download-zlib --download-libpng
> --download-giflib --download-libjpeg --download-slepc --download-eigen
>
> To reiterate, doing the exact same configure, but changing
> '-with-debugging=yes' to '-with-debugging=no' (and changing the PETSC_ARCH
> name to 'arch-linux-c-debug') and then building the non-debugging version
> of PETSc works as normal. Any ideas what could be going on?
>
> Thanks!
>
> -Zane Jakobs
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201003/3717d9dd/attachment.html>

From balay at mcs.anl.gov  Sat Oct  3 10:34:53 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Sat, 3 Oct 2020 10:34:53 -0500 (CDT)
Subject: [petsc-users] Debug build fails
In-Reply-To: <CAMYG4GmdfMMN_a5Q-m8wvSOTuuP+guwZ8uRFpTQpjp=C6_U1aA@mail.gmail.com>
References: <CABTd2fx16iSxNGBTLgbex=AXpMOTOg47VaC5FyWq4E_C3CJm7A@mail.gmail.com>
	<CAMYG4GmdfMMN_a5Q-m8wvSOTuuP+guwZ8uRFpTQpjp=C6_U1aA@mail.gmail.com>
Message-ID: <alpine.LFD.2.23.451.2010031033350.390709@sb>

Can use 'git status' to figure out what files are not in the repo.

And a 'git clean -f -d -x' will delete everything except for the files in the repo [so use with care].

Satish

On Sat, 3 Oct 2020, Matthew Knepley wrote:

> On Sat, Oct 3, 2020 at 11:11 AM Zane Charles Jakobs <
> Zane.Jakobs at colorado.edu> wrote:
> 
> > Hi PETSc devs,
> >
> > I just pulled the latest version of PETSc on master, and while my
> > optimized build works fine, my debug build fails with the message
> >
> > make[1]: *** No rule to make target 'src/sys/logging/examples/makefile'.
> > Stop.
> > make: *** [GNUmakefile:17: src/sys/logging/examples/makefile] Error 2
> >
> > Doing
> > ls src/sys/logging/examples
> >
> 
> Remove this directory. It is not in the repository, and its presence is
> confusing the automatic detection for the build.
> 
>   Thanks,
> 
>      Matt
> 
> 
> > shows a file named `index.html` and a directory named `tutorials`, but no
> > makefile. My configure line is
> >
> > ./configure PETSC_ARCH=arch-linux-c-debug --with-cc=clang
> > --with-cxx=clang++ COPTFLAGS="-O3 -march=native -mtune=native -fPIC"
> > CXXOPTFLAGS="-O3 -march=native -mtune=native -fPIC" FOPTFLAGS="-O3
> > -march=native -mtune=native -fPIC" --with-avx2=1 --download-mpich
> > --download-hypre --download-scalapack --download-mumps --with-debugging=yes
> > --with-blaslapack-dir=/opt/intel/mkl --download-zlib --download-libpng
> > --download-giflib --download-libjpeg --download-slepc --download-eigen
> >
> > To reiterate, doing the exact same configure, but changing
> > '-with-debugging=yes' to '-with-debugging=no' (and changing the PETSC_ARCH
> > name to 'arch-linux-c-debug') and then building the non-debugging version
> > of PETSc works as normal. Any ideas what could be going on?
> >
> > Thanks!
> >
> > -Zane Jakobs
> >
> 
> 
> 


From Zane.Jakobs at colorado.edu  Sat Oct  3 10:42:26 2020
From: Zane.Jakobs at colorado.edu (Zane Charles Jakobs)
Date: Sat, 3 Oct 2020 08:42:26 -0700
Subject: [petsc-users] Debug build fails
In-Reply-To: <alpine.LFD.2.23.451.2010031033350.390709@sb>
References: <CABTd2fx16iSxNGBTLgbex=AXpMOTOg47VaC5FyWq4E_C3CJm7A@mail.gmail.com>
	<CAMYG4GmdfMMN_a5Q-m8wvSOTuuP+guwZ8uRFpTQpjp=C6_U1aA@mail.gmail.com>
	<alpine.LFD.2.23.451.2010031033350.390709@sb>
Message-ID: <CABTd2fzqKeUt3oEFs0+2Nr3J4NdAumBi654u+VEB3QtWgLBwtA@mail.gmail.com>

Thanks, Matt and Satish! Everything seems like it's working now.

On Sat, Oct 3, 2020 at 8:34 AM Satish Balay <balay at mcs.anl.gov> wrote:

> Can use 'git status' to figure out what files are not in the repo.
>
> And a 'git clean -f -d -x' will delete everything except for the files in
> the repo [so use with care].
>
> Satish
>
> On Sat, 3 Oct 2020, Matthew Knepley wrote:
>
> > On Sat, Oct 3, 2020 at 11:11 AM Zane Charles Jakobs <
> > Zane.Jakobs at colorado.edu> wrote:
> >
> > > Hi PETSc devs,
> > >
> > > I just pulled the latest version of PETSc on master, and while my
> > > optimized build works fine, my debug build fails with the message
> > >
> > > make[1]: *** No rule to make target
> 'src/sys/logging/examples/makefile'.
> > > Stop.
> > > make: *** [GNUmakefile:17: src/sys/logging/examples/makefile] Error 2
> > >
> > > Doing
> > > ls src/sys/logging/examples
> > >
> >
> > Remove this directory. It is not in the repository, and its presence is
> > confusing the automatic detection for the build.
> >
> >   Thanks,
> >
> >      Matt
> >
> >
> > > shows a file named `index.html` and a directory named `tutorials`, but
> no
> > > makefile. My configure line is
> > >
> > > ./configure PETSC_ARCH=arch-linux-c-debug --with-cc=clang
> > > --with-cxx=clang++ COPTFLAGS="-O3 -march=native -mtune=native -fPIC"
> > > CXXOPTFLAGS="-O3 -march=native -mtune=native -fPIC" FOPTFLAGS="-O3
> > > -march=native -mtune=native -fPIC" --with-avx2=1 --download-mpich
> > > --download-hypre --download-scalapack --download-mumps
> --with-debugging=yes
> > > --with-blaslapack-dir=/opt/intel/mkl --download-zlib --download-libpng
> > > --download-giflib --download-libjpeg --download-slepc --download-eigen
> > >
> > > To reiterate, doing the exact same configure, but changing
> > > '-with-debugging=yes' to '-with-debugging=no' (and changing the
> PETSC_ARCH
> > > name to 'arch-linux-c-debug') and then building the non-debugging
> version
> > > of PETSc works as normal. Any ideas what could be going on?
> > >
> > > Thanks!
> > >
> > > -Zane Jakobs
> > >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201003/4a401ec6/attachment.html>

From swarnava89 at gmail.com  Sat Oct  3 19:36:08 2020
From: swarnava89 at gmail.com (Swarnava Ghosh)
Date: Sat, 3 Oct 2020 20:36:08 -0400
Subject: [petsc-users] Visualizing a 3D parallel DMPLEX mesh
Message-ID: <CAC9YzR4824Rjddd5FsJMw3WM-Ke03CbaVGeWK-zYf5VebYMWhQ@mail.gmail.com>

Hi Petsc users,

I have a 3D distributed DMPLEX mesh. I would like to visualize the mesh.
Specifically, I want to see domain ownership of every MPI rank, i.e. each
rank with a different color. . Would you please suggest the best way to do
this?

Sincerely,
SG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201003/6894c6d9/attachment.html>

From knepley at gmail.com  Sat Oct  3 21:19:09 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 3 Oct 2020 22:19:09 -0400
Subject: [petsc-users] Visualizing a 3D parallel DMPLEX mesh
In-Reply-To: <CAC9YzR4824Rjddd5FsJMw3WM-Ke03CbaVGeWK-zYf5VebYMWhQ@mail.gmail.com>
References: <CAC9YzR4824Rjddd5FsJMw3WM-Ke03CbaVGeWK-zYf5VebYMWhQ@mail.gmail.com>
Message-ID: <CAMYG4G=2HMT55GeCRYMidMfeGH_aArV+yvyoCN0gGyy3fea=_Q@mail.gmail.com>

On Sat, Oct 3, 2020 at 8:36 PM Swarnava Ghosh <swarnava89 at gmail.com> wrote:

> Hi Petsc users,
>
> I have a 3D distributed DMPLEX mesh. I would like to visualize the mesh.
> Specifically, I want to see domain ownership of every MPI rank, i.e. each
> rank with a different color. . Would you please suggest the best way to do
> this?
>

I do it this way. I use DMViewFromOptions(dm, NULL, "-dm_view") in my code.
Then I run it

  ./my_prog -dm_view hdf5:mesh.h5 -dm_partition_view
  ${PETSC_DIR}/lib/petsc/bin/petsc_gen_xdmf.py mesh.h5

which creates mesh.h5 and mesh.xmf which can be loaded in Paraview. There
is a "rank" field there that you can visualize over the mesh.

  Thanks,

     Matt


> Sincerely,
> SG
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201003/ca7d4b65/attachment.html>

From bastian.loehrer at tu-dresden.de  Mon Oct  5 12:26:42 2020
From: bastian.loehrer at tu-dresden.de (=?UTF-8?Q?Bastian_L=c3=b6hrer?=)
Date: Mon, 5 Oct 2020 19:26:42 +0200
Subject: [petsc-users] DMView / print out ownership ranges
In-Reply-To: <CAMYG4Gn0qM1Cv=q0FDZsOvAEjdQOGoHNsC66wUc1TiSyfEgasg@mail.gmail.com>
References: <31f4d18a-a6d3-95ff-26d5-fa2421d89d60@tu-dresden.de>
	<CAMYG4Gn0qM1Cv=q0FDZsOvAEjdQOGoHNsC66wUc1TiSyfEgasg@mail.gmail.com>
Message-ID: <335dfeb8-d36b-b909-3074-642856593e5d@tu-dresden.de>

Hello,

This was in fact our own bug and not an error in PETSc. A misplaced call 
to DMSetUp was the culprit.
I'm sorry to have bothered you with this.

We are going to update to the latest release, too.

Thanks,
Bastian

On 13.08.20 14:53, Matthew Knepley wrote:
> On Thu, Aug 13, 2020 at 8:49 AM Bastian L?hrer 
> <bastian.loehrer at tu-dresden.de <mailto:bastian.loehrer at tu-dresden.de>> 
> wrote:
>
>     Dear PETSc people,
>
>     in PETSc 3.3
>
>     ??? call DMView( dm, PETSC_VIEWER_STDOUT_WORLD, ierr)
>
>     printed out the ownership ranges like so:
>
>         Processor [0] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1
>         X range of indices: 0 32, Y range of indices: 0 17, Z range of
>         indices: 0 16
>         Processor [1] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1
>         X range of indices: 0 32, Y range of indices: 17 34, Z range
>         of indices: 0 16
>         Processor [2] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1
>         X range of indices: 0 32, Y range of indices: 0 17, Z range of
>         indices: 16 32
>         Processor [3] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1
>         X range of indices: 0 32, Y range of indices: 17 34, Z range
>         of indices: 16 32
>
>     In PETSc 3.8.4 (and later?) the same function call only prints out:
>
>         DM Object: 4 MPI processes
>         ? type: da
>
>     Does the feature to print out the ownership ranges still exist?
>     I am unable to find it.
>
> Certainly the latest release prints what you expect:
>
> knepley/feature-plex-stokes-tutorial 
> $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ make ex5
>
> /PETSc3/petsc/apple/bin/mpicc -Wl,-multiply_defined,suppress 
> -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs 
> -Wl,-search_paths_first -Wl,-no_compact_unwind-Wall -Wwrite-strings 
> -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector 
> -fno-stack-check -Qunused-arguments -fvisibility=hidden -g3 -Wall 
> -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas 
> -fstack-protector -fno-stack-check -Qunused-arguments 
> -fvisibility=hidden -g3 -I/PETSc3/petsc/petsc-dev/include 
> -I/PETSc3/petsc/petsc-dev/arch-master-debug/include -I/opt/X11/include 
> -I/PETSc3/petsc/apple/include 
> -I/PETSc3/petsc/petsc-dev/arch-master-debug/include/eigen3ex5.c-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-debug/lib 
> -L/PETSc3/petsc/petsc-dev/arch-master-debug/lib 
> -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-debug/lib 
> -L/PETSc3/petsc/petsc-dev/arch-master-debug/lib 
> -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib 
> -Wl,-rpath,/PETSc3/petsc/apple/lib -L/PETSc3/petsc/apple/lib 
> -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin19/9.2.0 
> -L/usr/local/lib/gcc/x86_64-apple-darwin19/9.2.0 
> -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lcmumps -ldmumps 
> -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu 
> -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig 
> -lsuperlu_dist -lml -lfftw3_mpi -lfftw3 -lp4est -lsc -llapack -lblas 
> -legadslite -ltriangle -lX11 -lexodus -lnetcdf -lpnetcdf 
> -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis 
> -lmetis -lz -lctetgen -lc++ -ldl -lmpifort -lmpi -lpmpi -lgfortran 
> -lquadmath -lm -lc++ -ldl -o ex5
>
> knepley/feature-plex-stokes-tutorial 
> $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ ./ex5 -dm_view
>
> DM Object: 1 MPI processes
>
> type: da
>
> Processor [0] M 4 N 4 m 1 n 1 w 1 s 1
>
> X range of indices: 0 4, Y range of indices: 0 4
>
> DM Object: 1 MPI processes
>
> type: da
>
> Processor [0] M 4 N 4 m 1 n 1 w 2 s 1
>
> X range of indices: 0 4, Y range of indices: 0 4
>
> knepley/feature-plex-stokes-tutorial 
> $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ $MPIEXEC -np 4 ./ex5 
> -dm_view
>
> DM Object: 4 MPI processes
>
> type: da
>
> Processor [0] M 4 N 4 m 2 n 2 w 1 s 1
>
> X range of indices: 0 2, Y range of indices: 0 2
>
> Processor [1] M 4 N 4 m 2 n 2 w 1 s 1
>
> X range of indices: 2 4, Y range of indices: 0 2
>
> Processor [2] M 4 N 4 m 2 n 2 w 1 s 1
>
> X range of indices: 0 2, Y range of indices: 2 4
>
> Processor [3] M 4 N 4 m 2 n 2 w 1 s 1
>
> X range of indices: 2 4, Y range of indices: 2 4
>
> DM Object: 4 MPI processes
>
> type: da
>
> Processor [0] M 4 N 4 m 2 n 2 w 2 s 1
>
> X range of indices: 0 2, Y range of indices: 0 2
>
> Processor [1] M 4 N 4 m 2 n 2 w 2 s 1
>
> X range of indices: 2 4, Y range of indices: 0 2
>
> Processor [2] M 4 N 4 m 2 n 2 w 2 s 1
>
> X range of indices: 0 2, Y range of indices: 2 4
>
> Processor [3] M 4 N 4 m 2 n 2 w 2 s 1
>
> X range of indices: 2 4, Y range of indices: 2 4
>
> We can try and go back to debug 3.8.4, but that is a long time ago. 
> Can you use the latest release?
>
> ? Thanks,
>
> ? ? Matt
>
>     Best,
>     Bastian
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201005/df00ce1b/attachment.html>

From olivier.jamond at cea.fr  Tue Oct  6 06:57:22 2020
From: olivier.jamond at cea.fr (Olivier Jamond)
Date: Tue, 6 Oct 2020 13:57:22 +0200
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
Message-ID: <b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>


On 03/10/2020 00:23, Barry Smith wrote:
>     I think what Jed is saying is that you should just actually build your preconditioner for your Ct*C + Qt*S*Q operator with S. Because Ct is tall and skinny the eigenstructure of Ct*C + Qt*S*Q is just the eigenstructure of S with a low rank "modification" and Krylov methods (GMRES) are good at solving problems where the eigenstructure of the preconditioner is only a small rank modification of the eigenstructure of the operator you are supply to GMRES. In the best situation each new iteration of GMRES corrects one more of the "rogue" eigen directions. I would first use a direct solver with S just to test how well it works as a preconditioner and then switch to GAMG or whatever should work efficiently for solving your particular S matrix.
>
>    I'd be interested in hearing how well the Ainsworth Formula works, it is something that might be worth adding to PCFIELDSPLIT.
>
>
>    Barry

Hi Barry,

Thanks for these clarifications.

To give some context, the test I am working on is a traction on an 
elastoplastic cube in large strain on which I apply 2% of strain at the 
first loading increment. The cube has 14739 dofs, and the number of rows 
of the C matrix is 867.

In this simple case, the C matrix just refers to simple dirichlet 
conditions. Then Q is diagonal with 1. on dofs without dirichlet on 0. 
for dofs with dirichlets. Q'*S*Q is like S with zeros on lines/columns 
referring to dofs with dirichlet, and?then C'*C just re-add non null 
value on the diagonal for the dofs with dirichlet. In the end, I feel 
that in this case the ainsworth method just do exactly the same as 
row/column elimination that can be done with MatZeroRowsColumns and the 
x and b optional vectors provided.

On this test, with '-ksp_rtol 1.e-9' and '-ksp_type gmres', using S as a 
preconditionning matrix and a direct solver gives 65 iterations of the 
gmres for my first newton iteration (where S is SPD) and between 170 and 
290 for the next ones (S is still symmetric but has negative 
eigenvalues). If I use '-pc_type gamg', the number of iterations of the 
gmres for the first (SPD) newton iteration is (14 with Sp / 23 with S), 
and for the next ones (not SPD) it is (~45 with Sp / ~180 with S).

In this case with only simple dirichlets, I think I would like that the 
PCApply does something like: (I-Q)*jacobi(Ct*C)*(I-Q) + Q*precond(S)*Q. 
BUt I am not sure how to do that (I am quite newbie with petsc)... With 
a PCShell and PCShellSetApply?

In the end, if we found something that works well with the ainsworth 
formula, it would be nice to have it natively with PCFIELDSPLIT!

Many thanks,
Olivier


From bsmith at petsc.dev  Tue Oct  6 10:51:37 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 6 Oct 2020 10:51:37 -0500
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
Message-ID: <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>


> On Oct 6, 2020, at 6:57 AM, Olivier Jamond <olivier.jamond at cea.fr> wrote:
> 
> 
> On 03/10/2020 00:23, Barry Smith wrote:
>>    I think what Jed is saying is that you should just actually build your preconditioner for your Ct*C + Qt*S*Q operator with S. Because Ct is tall and skinny the eigenstructure of Ct*C + Qt*S*Q is just the eigenstructure of S with a low rank "modification" and Krylov methods (GMRES) are good at solving problems where the eigenstructure of the preconditioner is only a small rank modification of the eigenstructure of the operator you are supply to GMRES. In the best situation each new iteration of GMRES corrects one more of the "rogue" eigen directions. I would first use a direct solver with S just to test how well it works as a preconditioner and then switch to GAMG or whatever should work efficiently for solving your particular S matrix.
>> 
>>   I'd be interested in hearing how well the Ainsworth Formula works, it is something that might be worth adding to PCFIELDSPLIT.
>> 
>> 
>>   Barry
> 
> Hi Barry,
> 
> Thanks for these clarifications.
> 
> To give some context, the test I am working on is a traction on an elastoplastic cube in large strain on which I apply 2% of strain at the first loading increment. The cube has 14739 dofs, and the number of rows of the C matrix is 867.
> 
> In this simple case, the C matrix just refers to simple dirichlet conditions. Then Q is diagonal with 1. on dofs without dirichlet on 0. for dofs with dirichlets. Q'*S*Q is like S with zeros on lines/columns referring to dofs with dirichlet, and then C'*C just re-add non null value on the diagonal for the dofs with dirichlet. In the end, I feel that in this case the ainsworth method just do exactly the same as row/column elimination that can be done with MatZeroRowsColumns and the x and b optional vectors provided.
> 
> On this test, with '-ksp_rtol 1.e-9' and '-ksp_type gmres', using S as a preconditionning matrix and a direct solver gives 65 iterations of the gmres for my first newton iteration (where S is SPD) and between 170 and 290 for the next ones (S is still symmetric but has negative eigenvalues). If I use '-pc_type gamg', the number of iterations of the gmres for the first (SPD) newton iteration is (14 with Sp / 23 with S), and for the next ones (not SPD) it is (~45 with Sp / ~180 with S).

   Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)).

    Sp = Ct*C  + Qt * S * Q = Ct*C  +  [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] 

[Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks.

You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]  then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants.   You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD.

  Perhaps I am missing something? Note you can also prototype this process in Matlab very quickly to find any glitches. Hopefully you will find [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]  has only a slightly "bulked out" sparsity of S.


> 
> In this case with only simple dirichlets, I think I would like that the PCApply does something like: (I-Q)*jacobi(Ct*C)*(I-Q) + Q*precond(S)*Q. BUt I am not sure how to do that (I am quite newbie with petsc)... With a PCShell and PCShellSetApply?

  I don't think there is any reason to think that using I-Q)*jacobi(Ct*C)*(I-Q) + Q*precond(S)*Q.  would be a particularly good preconditioner for Sp. That is much better than using a preconditioner built from S.  But you can use PCCOMPOSITE and PCSHELL with KSP inside for the two jacobi(Ct*C) and precond(S).

Barry


> 
> In the end, if we found something that works well with the ainsworth formula, it would be nice to have it natively with PCFIELDSPLIT!
> 
> Many thanks,
> Olivier
> 


From ashish.patel at onscale.com  Tue Oct  6 14:31:08 2020
From: ashish.patel at onscale.com (Ashish Patel)
Date: Tue, 6 Oct 2020 12:31:08 -0700
Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM
In-Reply-To: <CAASqdaUGDEcAQLMEWZ8Cyqaq-_Y8h9pOOUcxpKbVQtOdCaxtaA@mail.gmail.com>
References: <CAASqdaUGDEcAQLMEWZ8Cyqaq-_Y8h9pOOUcxpKbVQtOdCaxtaA@mail.gmail.com>
Message-ID: <CAASqdaVOTtL5cz60AzPUFTdnnCFR3jc0a=NLmcE8gpMs=Xe7=Q@mail.gmail.com>

Upon some more testing, the idea of adding the disconnected vertex point to
the cone of the surface point didn't pan out. It was effecting the closure
of other points in the mesh and also for mpi simulations the disconnected
vertex was thrown away from the distributed mesh. I instead switched to
using MatSetValues to achieve what I wanted along with supplementing the
dof of the reference node to a point in the continuous mesh. There is a
performance price to pay since I am adding new non zeros after the DM
matrix creation. Based on this post which had a similar problem statement
https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2017-January/031318.html
It seems that preallocating the matrix ourselves instead of using the DM is
a possible solution for avoiding this performance issue.

Thanks
Ashish

On Fri, Oct 2, 2020 at 4:15 PM Ashish Patel <ashish.patel at onscale.com>
wrote:

> Dear PETSc users,
>
> I am trying to assemble a matrix for a finite element problem where the
> degree of freedom (dof) on a surface is constrained via a reference node
> which also exists in the DM but is not connected  with any other point in
> the mesh. To apply the constraint I want to be able to set matrix values in
> the rows belonging to dofs of reference nodes and columns belonging to dofs
> of surface nodes and vice versa. But since the two points are not connected
> topologically I cannot just use DMPlexMatSetClosure to do that. I am
> currently trying to use DMPlexAddConeSize on all the constrained surface
> points followed by a call to DMPlexInsertCone wherein I add the reference
> node to the cone of surface point before setting up the PetscSection. Is
> this the right approach? I am currently getting following message
>
> New nonzero at (8952,28311) caused a malloc
> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
> off this check
>
> I could set the suggested option to get rid of the error but was wondering
> if I am missing something.
>
> Thanks
> Ashish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201006/cf8d21b0/attachment.html>

From knepley at gmail.com  Tue Oct  6 14:46:57 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 6 Oct 2020 15:46:57 -0400
Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM
In-Reply-To: <CAASqdaUGDEcAQLMEWZ8Cyqaq-_Y8h9pOOUcxpKbVQtOdCaxtaA@mail.gmail.com>
References: <CAASqdaUGDEcAQLMEWZ8Cyqaq-_Y8h9pOOUcxpKbVQtOdCaxtaA@mail.gmail.com>
Message-ID: <CAMYG4GkO02dYJ-_j9D9r-tZSUadeRrp8eUtzB+3iRfJYUgwdBw@mail.gmail.com>

On Fri, Oct 2, 2020 at 7:16 PM Ashish Patel <ashish.patel at onscale.com>
wrote:

> Dear PETSc users,
>
> I am trying to assemble a matrix for a finite element problem where the
> degree of freedom (dof) on a surface is constrained via a reference node
> which also exists in the DM but is not connected  with any other point in
> the mesh. To apply the constraint I want to be able to set matrix values in
> the rows belonging to dofs of reference nodes and columns belonging to dofs
> of surface nodes and vice versa. But since the two points are not connected
> topologically I cannot just use DMPlexMatSetClosure to do that. I am
> currently trying to use DMPlexAddConeSize on all the constrained surface
> points followed by a call to DMPlexInsertCone wherein I add the reference
> node to the cone of surface point before setting up the PetscSection. Is
> this the right approach?
>

No. You are changing the topology, which is not what you want I think. I
think you just want to associate extra dof with the face. You can
do this by just altering the PetscSection you use. Do you create the
Section yourself now?

  Thanks,

    Matt


> I am currently getting following message
>
> New nonzero at (8952,28311) caused a malloc
> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
> off this check
>
> I could set the suggested option to get rid of the error but was wondering
> if I am missing something.
>
> Thanks
> Ashish
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201006/c535f967/attachment.html>

From ashish.patel at onscale.com  Tue Oct  6 15:08:44 2020
From: ashish.patel at onscale.com (Ashish Patel)
Date: Tue, 6 Oct 2020 13:08:44 -0700
Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM
In-Reply-To: <CAMYG4GkO02dYJ-_j9D9r-tZSUadeRrp8eUtzB+3iRfJYUgwdBw@mail.gmail.com>
References: <CAASqdaUGDEcAQLMEWZ8Cyqaq-_Y8h9pOOUcxpKbVQtOdCaxtaA@mail.gmail.com>
	<CAMYG4GkO02dYJ-_j9D9r-tZSUadeRrp8eUtzB+3iRfJYUgwdBw@mail.gmail.com>
Message-ID: <CAASqdaWUnMe_LhEpYLzry5vtADZNimnu0c6tnAjY=jOw6KVqzw@mail.gmail.com>

Hi Matt,

Yes I do create the section myself. There are many faces which are
constrained by a single reference point. So even if I create an extra dof
at one of the faces I would have to get access to rows/columns of other
distant faces which does not exist in the adjacency relationship.

Thanks
Ashish

On Tue, Oct 6, 2020 at 12:47 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Oct 2, 2020 at 7:16 PM Ashish Patel <ashish.patel at onscale.com>
> wrote:
>
>> Dear PETSc users,
>>
>> I am trying to assemble a matrix for a finite element problem where the
>> degree of freedom (dof) on a surface is constrained via a reference node
>> which also exists in the DM but is not connected  with any other point in
>> the mesh. To apply the constraint I want to be able to set matrix values in
>> the rows belonging to dofs of reference nodes and columns belonging to dofs
>> of surface nodes and vice versa. But since the two points are not connected
>> topologically I cannot just use DMPlexMatSetClosure to do that. I am
>> currently trying to use DMPlexAddConeSize on all the constrained surface
>> points followed by a call to DMPlexInsertCone wherein I add the reference
>> node to the cone of surface point before setting up the PetscSection. Is
>> this the right approach?
>>
>
> No. You are changing the topology, which is not what you want I think. I
> think you just want to associate extra dof with the face. You can
> do this by just altering the PetscSection you use. Do you create the
> Section yourself now?
>
>   Thanks,
>
>     Matt
>
>
>> I am currently getting following message
>>
>> New nonzero at (8952,28311) caused a malloc
>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
>> off this check
>>
>> I could set the suggested option to get rid of the error but was
>> wondering if I am missing something.
>>
>> Thanks
>> Ashish
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201006/5daa906a/attachment.html>

From knepley at gmail.com  Tue Oct  6 15:59:11 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 6 Oct 2020 16:59:11 -0400
Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM
In-Reply-To: <CAASqdaWUnMe_LhEpYLzry5vtADZNimnu0c6tnAjY=jOw6KVqzw@mail.gmail.com>
References: <CAASqdaUGDEcAQLMEWZ8Cyqaq-_Y8h9pOOUcxpKbVQtOdCaxtaA@mail.gmail.com>
	<CAMYG4GkO02dYJ-_j9D9r-tZSUadeRrp8eUtzB+3iRfJYUgwdBw@mail.gmail.com>
	<CAASqdaWUnMe_LhEpYLzry5vtADZNimnu0c6tnAjY=jOw6KVqzw@mail.gmail.com>
Message-ID: <CAMYG4Gn_SQHvj1uDjx8OJK-Q9nxnmM3TBXsHPrbJTFwr=6fakg@mail.gmail.com>

On Tue, Oct 6, 2020 at 4:08 PM Ashish Patel <ashish.patel at onscale.com>
wrote:

> Hi Matt,
>
> Yes I do create the section myself. There are many faces which are
> constrained by a single reference point. So even if I create an extra dof
> at one of the faces I would have to get access to rows/columns of other
> distant faces which does not exist in the adjacency relationship.
>

Okay, so these dogs do not obey the FEM sparsity pattern. Is it inherent to
the method, or do you just do this for convenience? It might
be easier to constrain one face at a time, but I would need to understand
more. If you cannot do that, you would also have to override
the sparsity pattern. I can easily give you a hook to do this if you want.

  Thanks,

    Matt


> Thanks
> Ashish
>
> On Tue, Oct 6, 2020 at 12:47 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Fri, Oct 2, 2020 at 7:16 PM Ashish Patel <ashish.patel at onscale.com>
>> wrote:
>>
>>> Dear PETSc users,
>>>
>>> I am trying to assemble a matrix for a finite element problem where the
>>> degree of freedom (dof) on a surface is constrained via a reference node
>>> which also exists in the DM but is not connected  with any other point in
>>> the mesh. To apply the constraint I want to be able to set matrix values in
>>> the rows belonging to dofs of reference nodes and columns belonging to dofs
>>> of surface nodes and vice versa. But since the two points are not connected
>>> topologically I cannot just use DMPlexMatSetClosure to do that. I am
>>> currently trying to use DMPlexAddConeSize on all the constrained surface
>>> points followed by a call to DMPlexInsertCone wherein I add the reference
>>> node to the cone of surface point before setting up the PetscSection. Is
>>> this the right approach?
>>>
>>
>> No. You are changing the topology, which is not what you want I think. I
>> think you just want to associate extra dof with the face. You can
>> do this by just altering the PetscSection you use. Do you create the
>> Section yourself now?
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> I am currently getting following message
>>>
>>> New nonzero at (8952,28311) caused a malloc
>>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
>>> off this check
>>>
>>> I could set the suggested option to get rid of the error but was
>>> wondering if I am missing something.
>>>
>>> Thanks
>>> Ashish
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201006/746b92be/attachment-0001.html>

From ashish.patel at onscale.com  Tue Oct  6 16:28:27 2020
From: ashish.patel at onscale.com (Ashish Patel)
Date: Tue, 6 Oct 2020 14:28:27 -0700
Subject: [petsc-users] DMPlexMatSetClosure for non connected points in DM
In-Reply-To: <CAMYG4Gn_SQHvj1uDjx8OJK-Q9nxnmM3TBXsHPrbJTFwr=6fakg@mail.gmail.com>
References: <CAASqdaUGDEcAQLMEWZ8Cyqaq-_Y8h9pOOUcxpKbVQtOdCaxtaA@mail.gmail.com>
	<CAMYG4GkO02dYJ-_j9D9r-tZSUadeRrp8eUtzB+3iRfJYUgwdBw@mail.gmail.com>
	<CAASqdaWUnMe_LhEpYLzry5vtADZNimnu0c6tnAjY=jOw6KVqzw@mail.gmail.com>
	<CAMYG4Gn_SQHvj1uDjx8OJK-Q9nxnmM3TBXsHPrbJTFwr=6fakg@mail.gmail.com>
Message-ID: <CAASqdaXqvaY9y7uT8DeGmDoGcYfDuGQFwHbJ6r7YyT3v8C1o1g@mail.gmail.com>

Its inherent, physical problem for this particular case is a rigid body
(whose dof is represented using a reference point) attached to a deformable
body along an interface (hence many faces). The sparsity pattern for a
deformable body follows the traditional FEM sparsity pattern. So I need to
supplement that sparsity pattern to account for the constraint off diagonal
terms. I am doing it currently after DMCreateMatrix via call to
MatSetValues but that's of course not ideal. I also tried doing it by first
calling  DMSetMatrixPreallocateOnly but that caused problems later on as
probably the number of new values exceeded the preallocated limit for that
row. If you have some additional pointers that would be really helpful.

Thanks
Ashish

On Tue, Oct 6, 2020 at 1:59 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Oct 6, 2020 at 4:08 PM Ashish Patel <ashish.patel at onscale.com>
> wrote:
>
>> Hi Matt,
>>
>> Yes I do create the section myself. There are many faces which are
>> constrained by a single reference point. So even if I create an extra dof
>> at one of the faces I would have to get access to rows/columns of other
>> distant faces which does not exist in the adjacency relationship.
>>
>
> Okay, so these dogs do not obey the FEM sparsity pattern. Is it inherent
> to the method, or do you just do this for convenience? It might
> be easier to constrain one face at a time, but I would need to understand
> more. If you cannot do that, you would also have to override
> the sparsity pattern. I can easily give you a hook to do this if you want.
>
>   Thanks,
>
>     Matt
>
>
>> Thanks
>> Ashish
>>
>> On Tue, Oct 6, 2020 at 12:47 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Fri, Oct 2, 2020 at 7:16 PM Ashish Patel <ashish.patel at onscale.com>
>>> wrote:
>>>
>>>> Dear PETSc users,
>>>>
>>>> I am trying to assemble a matrix for a finite element problem where the
>>>> degree of freedom (dof) on a surface is constrained via a reference node
>>>> which also exists in the DM but is not connected  with any other point in
>>>> the mesh. To apply the constraint I want to be able to set matrix values in
>>>> the rows belonging to dofs of reference nodes and columns belonging to dofs
>>>> of surface nodes and vice versa. But since the two points are not connected
>>>> topologically I cannot just use DMPlexMatSetClosure to do that. I am
>>>> currently trying to use DMPlexAddConeSize on all the constrained surface
>>>> points followed by a call to DMPlexInsertCone wherein I add the reference
>>>> node to the cone of surface point before setting up the PetscSection. Is
>>>> this the right approach?
>>>>
>>>
>>> No. You are changing the topology, which is not what you want I think. I
>>> think you just want to associate extra dof with the face. You can
>>> do this by just altering the PetscSection you use. Do you create the
>>> Section yourself now?
>>>
>>>   Thanks,
>>>
>>>     Matt
>>>
>>>
>>>> I am currently getting following message
>>>>
>>>> New nonzero at (8952,28311) caused a malloc
>>>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to
>>>> turn off this check
>>>>
>>>> I could set the suggested option to get rid of the error but was
>>>> wondering if I am missing something.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201006/d67fd2c0/attachment.html>

From sam.guo at cd-adapco.com  Tue Oct  6 16:59:39 2020
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Tue, 6 Oct 2020 14:59:39 -0700
Subject: [petsc-users] compiling PETSc using c++ compiler
Message-ID: <CAAZdwQteDb3Dr0dsQXNPi+Mxj+BJLobf8109kciJGHACOmpu0A@mail.gmail.com>

Dear PETSc dev team,
  When I compile PETSc using
--with-cc=gcc --with-cxx=g++ --with-clanguage=cxx,
I got following error:
../../../petsc/src/sys/objects/pinit.c: In function ?PetscInitialize?:
../../../petsc/src/sys/objects/pinit.c:913:21: error: expected declaration
specifiers or ?...? before numeric constant
  913 |     PetscComplex ic(0.0,1.0);
      |                     ^~~
../../../petsc/src/sys/objects/pinit.c:913:25: error: expected declaration
specifiers or ?...? before numeric constant
  913 |     PetscComplex ic(0.0,1.0);
      |                         ^~~
../../../petsc/src/sys/objects/pinit.c:914:15: error: ?ic? undeclared
(first use in this function)
  914 |     PETSC_i = ic;
      |               ^~
../../../petsc/src/sys/objects/pinit.c:914:15: note: each undeclared
identifier is reported only once for each function it appears in

 Thanks,
Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201006/c43acf5a/attachment.html>

From balay at mcs.anl.gov  Tue Oct  6 17:23:12 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 6 Oct 2020 17:23:12 -0500 (CDT)
Subject: [petsc-users] compiling PETSc using c++ compiler
In-Reply-To: <CAAZdwQteDb3Dr0dsQXNPi+Mxj+BJLobf8109kciJGHACOmpu0A@mail.gmail.com>
References: <CAAZdwQteDb3Dr0dsQXNPi+Mxj+BJLobf8109kciJGHACOmpu0A@mail.gmail.com>
Message-ID: <alpine.LFD.2.23.451.2010061722390.3684@sb>

Can you send the complete logs for this build [configure.log, make.log]?

Satish

On Tue, 6 Oct 2020, Sam Guo wrote:

> Dear PETSc dev team,
>   When I compile PETSc using
> --with-cc=gcc --with-cxx=g++ --with-clanguage=cxx,
> I got following error:
> ../../../petsc/src/sys/objects/pinit.c: In function ?PetscInitialize?:
> ../../../petsc/src/sys/objects/pinit.c:913:21: error: expected declaration
> specifiers or ?...? before numeric constant
>   913 |     PetscComplex ic(0.0,1.0);
>       |                     ^~~
> ../../../petsc/src/sys/objects/pinit.c:913:25: error: expected declaration
> specifiers or ?...? before numeric constant
>   913 |     PetscComplex ic(0.0,1.0);
>       |                         ^~~
> ../../../petsc/src/sys/objects/pinit.c:914:15: error: ?ic? undeclared
> (first use in this function)
>   914 |     PETSC_i = ic;
>       |               ^~
> ../../../petsc/src/sys/objects/pinit.c:914:15: note: each undeclared
> identifier is reported only once for each function it appears in
> 
>  Thanks,
> Sam
> 

From balay at mcs.anl.gov  Tue Oct  6 17:35:21 2020
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 6 Oct 2020 17:35:21 -0500 (CDT)
Subject: [petsc-users] compiling PETSc using c++ compiler
In-Reply-To: <alpine.LFD.2.23.451.2010061722390.3684@sb>
References: <CAAZdwQteDb3Dr0dsQXNPi+Mxj+BJLobf8109kciJGHACOmpu0A@mail.gmail.com>
	<alpine.LFD.2.23.451.2010061722390.3684@sb>
Message-ID: <alpine.LFD.2.23.451.2010061734130.3684@sb>

And BTW: --with-clanguage=cxx is not needed for using PETSc from
c++. It primarily exists for debugging purposes [or some corner cases
where C build of PETSc does not work from c++ code]

Satish

On Tue, 6 Oct 2020, Satish Balay via petsc-users wrote:

> Can you send the complete logs for this build [configure.log, make.log]?
> 
> Satish
> 
> On Tue, 6 Oct 2020, Sam Guo wrote:
> 
> > Dear PETSc dev team,
> >   When I compile PETSc using
> > --with-cc=gcc --with-cxx=g++ --with-clanguage=cxx,
> > I got following error:
> > ../../../petsc/src/sys/objects/pinit.c: In function ?PetscInitialize?:
> > ../../../petsc/src/sys/objects/pinit.c:913:21: error: expected declaration
> > specifiers or ?...? before numeric constant
> >   913 |     PetscComplex ic(0.0,1.0);
> >       |                     ^~~
> > ../../../petsc/src/sys/objects/pinit.c:913:25: error: expected declaration
> > specifiers or ?...? before numeric constant
> >   913 |     PetscComplex ic(0.0,1.0);
> >       |                         ^~~
> > ../../../petsc/src/sys/objects/pinit.c:914:15: error: ?ic? undeclared
> > (first use in this function)
> >   914 |     PETSC_i = ic;
> >       |               ^~
> > ../../../petsc/src/sys/objects/pinit.c:914:15: note: each undeclared
> > identifier is reported only once for each function it appears in
> > 
> >  Thanks,
> > Sam
> > 
> 

From sam.guo at cd-adapco.com  Tue Oct  6 18:44:51 2020
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Tue, 6 Oct 2020 16:44:51 -0700
Subject: [petsc-users] compiling PETSc using c++ compiler
In-Reply-To: <alpine.LFD.2.23.451.2010061734130.3684@sb>
References: <CAAZdwQteDb3Dr0dsQXNPi+Mxj+BJLobf8109kciJGHACOmpu0A@mail.gmail.com>
	<alpine.LFD.2.23.451.2010061722390.3684@sb>
	<alpine.LFD.2.23.451.2010061734130.3684@sb>
Message-ID: <CAAZdwQtkak99gCFAvZQsgKJf8-stiw3UdK6W+MKDUbpwoqpSWw@mail.gmail.com>

Hi Satish,
   I am using our internal makefile to make PETSc. When I use PETSc
makefile, it works. Hence it must be our compiling flags responsible for
the error. I'll look into it.
   I want to experiment c++ compiler to see if I can compile real and
complex versions to different symbols (I've created another thread for this
topic.) but it seems c++ compiler does not help and I still get same
symbols.

Thanks,
Sam

On Tue, Oct 6, 2020 at 3:35 PM Satish Balay <balay at mcs.anl.gov> wrote:

> And BTW: --with-clanguage=cxx is not needed for using PETSc from
> c++. It primarily exists for debugging purposes [or some corner cases
> where C build of PETSc does not work from c++ code]
>
> Satish
>
> On Tue, 6 Oct 2020, Satish Balay via petsc-users wrote:
>
> > Can you send the complete logs for this build [configure.log, make.log]?
> >
> > Satish
> >
> > On Tue, 6 Oct 2020, Sam Guo wrote:
> >
> > > Dear PETSc dev team,
> > >   When I compile PETSc using
> > > --with-cc=gcc --with-cxx=g++ --with-clanguage=cxx,
> > > I got following error:
> > > ../../../petsc/src/sys/objects/pinit.c: In function ?PetscInitialize?:
> > > ../../../petsc/src/sys/objects/pinit.c:913:21: error: expected
> declaration
> > > specifiers or ?...? before numeric constant
> > >   913 |     PetscComplex ic(0.0,1.0);
> > >       |                     ^~~
> > > ../../../petsc/src/sys/objects/pinit.c:913:25: error: expected
> declaration
> > > specifiers or ?...? before numeric constant
> > >   913 |     PetscComplex ic(0.0,1.0);
> > >       |                         ^~~
> > > ../../../petsc/src/sys/objects/pinit.c:914:15: error: ?ic? undeclared
> > > (first use in this function)
> > >   914 |     PETSC_i = ic;
> > >       |               ^~
> > > ../../../petsc/src/sys/objects/pinit.c:914:15: note: each undeclared
> > > identifier is reported only once for each function it appears in
> > >
> > >  Thanks,
> > > Sam
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201006/50ead1bd/attachment.html>

From jed at jedbrown.org  Tue Oct  6 23:49:10 2020
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 06 Oct 2020 22:49:10 -0600
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
	<358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
Message-ID: <87ft6qmyih.fsf@jedbrown.org>

Barry Smith <bsmith at petsc.dev> writes:

>> On Oct 6, 2020, at 6:57 AM, Olivier Jamond <olivier.jamond at cea.fr> wrote:
>> 
>> 
>> On 03/10/2020 00:23, Barry Smith wrote:
>>>    I think what Jed is saying is that you should just actually build your preconditioner for your Ct*C + Qt*S*Q operator with S. Because Ct is tall and skinny the eigenstructure of Ct*C + Qt*S*Q is just the eigenstructure of S with a low rank "modification" and Krylov methods (GMRES) are good at solving problems where the eigenstructure of the preconditioner is only a small rank modification of the eigenstructure of the operator you are supply to GMRES. In the best situation each new iteration of GMRES corrects one more of the "rogue" eigen directions. I would first use a direct solver with S just to test how well it works as a preconditioner and then switch to GAMG or whatever should work efficiently for solving your particular S matrix.
>>> 
>>>   I'd be interested in hearing how well the Ainsworth Formula works, it is something that might be worth adding to PCFIELDSPLIT.
>>> 
>>> 
>>>   Barry
>> 
>> Hi Barry,
>> 
>> Thanks for these clarifications.
>> 
>> To give some context, the test I am working on is a traction on an elastoplastic cube in large strain on which I apply 2% of strain at the first loading increment. The cube has 14739 dofs, and the number of rows of the C matrix is 867.
>> 
>> In this simple case, the C matrix just refers to simple dirichlet conditions. Then Q is diagonal with 1. on dofs without dirichlet on 0. for dofs with dirichlets. Q'*S*Q is like S with zeros on lines/columns referring to dofs with dirichlet, and then C'*C just re-add non null value on the diagonal for the dofs with dirichlet. In the end, I feel that in this case the ainsworth method just do exactly the same as row/column elimination that can be done with MatZeroRowsColumns and the x and b optional vectors provided.
>> 
>> On this test, with '-ksp_rtol 1.e-9' and '-ksp_type gmres', using S as a preconditionning matrix and a direct solver gives 65 iterations of the gmres for my first newton iteration (where S is SPD) and between 170 and 290 for the next ones (S is still symmetric but has negative eigenvalues). If I use '-pc_type gamg', the number of iterations of the gmres for the first (SPD) newton iteration is (14 with Sp / 23 with S), and for the next ones (not SPD) it is (~45 with Sp / ~180 with S).
>
>    Given the structure of C it seems you should just explicitly
>    construct Sp and use GAMG (or other preconditioners, even a direct
>    solver) directly on Sp. Trying to avoid explicitly forming Sp will
>    give you a much slower performing solving for what benefit? 

Note that S is singular if it's a pure Neumann stiffness matrix (rigid body modes are in the null space).  I'm with Barry that you should just form Sp, which is much more solver friendly.  If your constraint matrix C has dense rows (e.g., integral conditions on the boundary), then use FieldSplit for the saddle point problem.

From dalcinl at gmail.com  Wed Oct  7 04:26:25 2020
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Wed, 7 Oct 2020 12:26:25 +0300
Subject: [petsc-users] compiling PETSc using c++ compiler
In-Reply-To: <CAAZdwQtkak99gCFAvZQsgKJf8-stiw3UdK6W+MKDUbpwoqpSWw@mail.gmail.com>
References: <CAAZdwQteDb3Dr0dsQXNPi+Mxj+BJLobf8109kciJGHACOmpu0A@mail.gmail.com>
	<alpine.LFD.2.23.451.2010061722390.3684@sb>
	<alpine.LFD.2.23.451.2010061734130.3684@sb>
	<CAAZdwQtkak99gCFAvZQsgKJf8-stiw3UdK6W+MKDUbpwoqpSWw@mail.gmail.com>
Message-ID: <CAEcYPwABb_Q7zcy=Xh9jyoPgwHov1gJ4PMsKx4iRJF08sw=v4w@mail.gmail.com>

On Wed, 7 Oct 2020 at 02:45, Sam Guo <sam.guo at cd-adapco.com> wrote:

>    I want to experiment c++ compiler to see if I can compile real and
> complex versions to different symbols (I've created another thread for this
> topic.) but it seems c++ compiler does not help and I still get same
> symbols.
>
>
Indeed, that is not going to make it. Or you have to change the definition
of PETSC_EXTERN, such that it does not use `extern "C"`.  Even with that,
you may have trouble with EXTERN_C_BEGIN/END macros.

-- 
Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201007/b3551c5c/attachment.html>

From mfadams at lbl.gov  Wed Oct  7 08:11:52 2020
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 7 Oct 2020 09:11:52 -0400
Subject: [petsc-users] [petsc-maint] Poisson with Neumann BC on a
 stretched grid
In-Reply-To: <b3187774-5979-185a-7d00-4e2e360374f4@uclouvain.be>
References: <982e1f8e-8b74-8b69-c7ab-fa7eeec650a6@uclouvain.be>
	<CAMYG4Gk+YJmaV9-7YCiQRRy5eZ3aXTjzooCTGTRSp6sZeZmtPw@mail.gmail.com>
	<c9131cb1-28bb-27ba-f6b6-292545b0c17a@uclouvain.be>
	<CAMYG4GmvWPdDxLshaF6vbbcW_JHk12uAGTTfe6cbqYuEyuppnQ@mail.gmail.com>
	<CADOhEh5Z7v-8ByqLKK_7YUyRc_j-fzp=5R7SwR+q6MHr7M33+g@mail.gmail.com>
	<b3187774-5979-185a-7d00-4e2e360374f4@uclouvain.be>
Message-ID: <CADOhEh4Lm_b9wq-k=-vLO8PryCswM0737=EuhtSrznFEdUp85g@mail.gmail.com>

On Wed, Oct 7, 2020 at 8:27 AM Victoria Hamtiaux <
victoria.hamtiaux at uclouvain.be> wrote:

> Thanks for all the answers,
>
>
> How can I do the "semi-coarsening"? I don't really know how those
> preconditionners work so I don't how how to change them or so..
>
>
> You have to write custom code to do semi-coarsening. PETSc does not
provide that and you would not want to do it yourself, most likely.


> I have a question because you both seem to say that my matrix is supposed
> to be symmetric which is not the case. \
>
You said "my matrix is symmetric."

Then you said " I suspect that by stretching the grid, my matrix is not
symmetric anymore and that it might cause a problem."

We are saying that by stretchin the grid the matrix is still symmetric even
if the grid has lost a symmetry. I don't know of a mechanism for stretching
the grid to make the matrix asymmetric. So we are suggesting that you
verify your suspicion that the matrix is symmetric.

And in fact, I don't get how it can be symmetric. Because you will have
> something close to symmetric. For example when you are at the center of
> your domain it will be symmetric, but when your at a point at the
> boundaries I don't get how you can be symmetric, you won't have something
> at the left and the right of your main diagonal... (I don't know if my
> explanations are understandable)
>
You can make a discretization that is not symmetric because of boundary
conditions but I assume that is not the case because you said your matrix
is symmetric.


> Best regards,
>
>
> Victoria
>
>
>
> On 7/10/20 14:20, Mark Adams wrote:
>
> GMG (geometric MG) is stronger as Matt said, but it is affected by
> stretched grids in a similar way. A way to fix this in GMG is
> semi-coarsening, which AMG _can_ do automatically.
>
> On Wed, Oct 7, 2020 at 8:02 AM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Oct 7, 2020 at 7:07 AM Victoria Hamtiaux <
>> victoria.hamtiaux at uclouvain.be> wrote:
>>
>>> Hello Matt,
>>>
>>>
>>> I just checked the symmetry of my matrix and it is not symmetric. But it
>>> is not symmetric either when I use a uniform grid.
>>>
>>> The domain is 3D and I'm using finite differences, so I guess it is
>>> normal that at multiple places (when I deal with points near the
>>> boundaries), the matrix is not symmetric.
>>>
>>> So I was wrong, the problem doesn't come from the fact that the matrix
>>> is not symmetric. I don't know where it comes from, but when I switch from
>>> uniform to stretched grid, the solver stops working properly. Could it be
>>> from the preconditionner of the solver that I use?
>>>
>>> Do you have any other idea ?
>>>
>> I would consider using GMG. As Mark says, AMG is very fiddly with
>> stretched grids. For Poisson, GMG works great and you seem to have regular
>> grids.
>>
>>   Thanks,
>>
>>     Matt
>>
>>> Thanks for your help,
>>>
>>>
>>> Victoria
>>>
>>>
>>> On 7/10/20 12:48, Matthew Knepley wrote:
>>>
>>> On Wed, Oct 7, 2020 at 6:40 AM Victoria Hamtiaux <
>>> victoria.hamtiaux at uclouvain.be> wrote:
>>>
>>>> Dear all,
>>>>
>>>>
>>>> After the discretization of a poisson equation with purely Neumann (or
>>>> periodic) boundary conditions, I get a matrix which is singular.
>>>>
>>>>
>>>> The way I am handling this is by using a NullSpace with the following
>>>> code :
>>>>
>>>> MatNullSpace nullspace;
>>>> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, 0, &nullspace);
>>>> MatSetNullSpace(p_solverp->A, nullspace);
>>>> MatSetTransposeNullSpace(p_solverp->A, nullspace);
>>>> MatNullSpaceDestroy(&nullspace);
>>>>
>>>>
>>>> Note that I am using the hypre preconditionner BOOMERANG and the
>>>> default
>>>> solver GMRES.
>>>>
>>>>
>>>>      KSPCreate(PETSC_COMM_WORLD,&p_solverp->ksp);
>>>>      KSPSetOperators(p_solverp->ksp, p_solverp->A, p_solverp->A);
>>>>      PC prec;
>>>>      KSPGetPC(p_solverp->ksp, &prec);
>>>>      PCSetType(prec,PCHYPRE);//PCHYPRE seems the best
>>>>      PCHYPRESetType(prec,"boomeramg"); //boomeramg is the best
>>>>      KSPSetInitialGuessNonzero(p_solverp->ksp,PETSC_TRUE);
>>>>      KSPSetFromOptions(p_solverp->ksp);
>>>>      KSPSetTolerances(p_solverp->ksp, 1.e-10, 1e-10, PETSC_DEFAULT,
>>>> PETSC_DEFAULT);
>>>>      KSPSetReusePreconditioner(p_solverp->ksp,PETSC_TRUE);
>>>>      KSPSetUseFischerGuess(p_solverp->ksp,1,5);
>>>>      KSPGMRESSetPreAllocateVectors(p_solverp->ksp);
>>>>      KSPSetUp(p_solverp->ksp);
>>>>
>>>>
>>>>
>>>> And this works fine when my grid is uniform, so that my matrix is
>>>> symmetric.
>>>>
>>>>
>>>> But when I stretch the grid near the boundary (my grid is then
>>>> non-uniform), it doesn't work properly anymore. I suspect that by
>>>> stretching the grid, my matrix is not symmetric anymore and that it
>>>> might cause a problem.
>>>>
>>>
>>> Symmetry is a property of the operator, so you should be symmetric on
>>> your
>>> stretched grid. If not, I think you have the discretization wrong. You
>>> can check
>>> symmetry using
>>>
>>>
>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatIsSymmetric.html
>>> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fpetsc-current%2Fdocs%2Fmanualpages%2FMat%2FMatIsSymmetric.html&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C494a0f05bd214b5974f008d86abb6e02%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376700525925749&sdata=nWt0sejio7o1PzMoc7tPu7JOvcNqofRuMQ91ynW54r4%3D&reserved=0>
>>>
>>> Also, if you suspect your discretization, you should probably do an MMS
>>> test to
>>> verify that you discretization converges at the correct rate.
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> I tried fixing the solution at an arbitrary point, but sometimes doing
>>>> this, I get errors near that fixed point. I 've seen on the petsc-users
>>>> forum that you usually don't recommend to fix a point, but I don't
>>>> really know how to proceed differently.
>>>>
>>>>
>>>> What would you recommend to solve this problem?
>>>>
>>>>
>>>> Thanks for your help,
>>>>
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Victoria
>>>>
>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <https://eur03.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C494a0f05bd214b5974f008d86abb6e02%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376700525925749&sdata=LHARUv3BxSwWnxN2LUJnX3vr2ZJ9f50EMQzw44Hy%2FqY%3D&reserved=0>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <https://eur03.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C494a0f05bd214b5974f008d86abb6e02%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376700525935751&sdata=kOKe%2FLj7pvAdzldpTNlRfC7BS6Vv4S5mU6Cb8pPpmrE%3D&reserved=0>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201007/14857b31/attachment.html>

From bsmith at petsc.dev  Wed Oct  7 10:59:44 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 7 Oct 2020 10:59:44 -0500
Subject: [petsc-users] [petsc-maint] Poisson with Neumann BC on a
 stretched grid
In-Reply-To: <CADOhEh4Lm_b9wq-k=-vLO8PryCswM0737=EuhtSrznFEdUp85g@mail.gmail.com>
References: <982e1f8e-8b74-8b69-c7ab-fa7eeec650a6@uclouvain.be>
	<CAMYG4Gk+YJmaV9-7YCiQRRy5eZ3aXTjzooCTGTRSp6sZeZmtPw@mail.gmail.com>
	<c9131cb1-28bb-27ba-f6b6-292545b0c17a@uclouvain.be>
	<CAMYG4GmvWPdDxLshaF6vbbcW_JHk12uAGTTfe6cbqYuEyuppnQ@mail.gmail.com>
	<CADOhEh5Z7v-8ByqLKK_7YUyRc_j-fzp=5R7SwR+q6MHr7M33+g@mail.gmail.com>
	<b3187774-5979-185a-7d00-4e2e360374f4@uclouvain.be>
	<CADOhEh4Lm_b9wq-k=-vLO8PryCswM0737=EuhtSrznFEdUp85g@mail.gmail.com>
Message-ID: <DB36A339-6908-49FD-9D69-4FF0918EA893@petsc.dev>


> On Oct 7, 2020, at 8:11 AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> On Wed, Oct 7, 2020 at 8:27 AM Victoria Hamtiaux <victoria.hamtiaux at uclouvain.be <mailto:victoria.hamtiaux at uclouvain.be>> wrote:
> Thanks for all the answers,
> 
> 
> 
> How can I do the "semi-coarsening"? I don't really know how those preconditionners work so I don't how how to change them or so..
> 
> 
> 
> You have to write custom code to do semi-coarsening. PETSc does not provide that and you would not want to do it yourself, most likely.

  We do not provide it directly but if you are using PCMG and DMDA it is relatively straight-forward. You create a coarse DM and then refine it but each refinement you only do in the directions you want set each time with DMDASetRefinementFactor(). Once you have the collections of refined DM's you provide them to PCMG.

  Barry

>  
> 
> I have a question because you both seem to say that my matrix is supposed to be symmetric which is not the case. \
> 
> You said "my matrix is symmetric." 
> 
> Then you said " I suspect that by stretching the grid, my matrix is not symmetric anymore and that it might cause a problem."
> 
> We are saying that by stretchin the grid the matrix is still symmetric even if the grid has lost a symmetry. I don't know of a mechanism for stretching the grid to make the matrix asymmetric. So we are suggesting that you verify your suspicion that the matrix is symmetric.
> 
> And in fact, I don't get how it can be symmetric. Because you will have something close to symmetric. For example when you are at the center of your domain it will be symmetric, but when your at a point at the boundaries I don't get how you can be symmetric, you won't have something at the left and the right of your main diagonal... (I       don't know if my explanations are understandable)
> 
> You can make a discretization that is not symmetric because of boundary conditions but I assume that is not the case because you said your matrix is symmetric.
>  
> Best regards, 
> 
> 
> 
> Victoria
> 
> 
> 
> 
> 
> On 7/10/20 14:20, Mark Adams wrote:
>> GMG (geometric MG) is stronger as Matt said, but it is affected by stretched grids in a similar way. A way to fix this in GMG is semi-coarsening, which AMG _can_ do automatically.
>> 
>> On Wed, Oct 7, 2020 at 8:02 AM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>> On Wed, Oct 7, 2020 at 7:07 AM Victoria Hamtiaux <victoria.hamtiaux at uclouvain.be <mailto:victoria.hamtiaux at uclouvain.be>> wrote:
>> Hello Matt, 
>> 
>> 
>> 
>> I just checked the symmetry of my matrix and it is not symmetric. But it is not symmetric either when I use a uniform grid.
>> 
>> The domain is 3D and I'm using finite differences, so I guess it is normal that at multiple places (when I deal with points near the boundaries), the matrix is not symmetric.
>> 
>> So I was wrong, the problem doesn't come from the fact that the matrix is not symmetric. I don't know where it comes from, but when I switch from uniform to stretched grid, the solver stops working properly. Could it be from the preconditionner of the solver that I use?
>> 
>> Do you have any other idea ? 
>> 
>> I would consider using GMG. As Mark says, AMG is very fiddly with stretched grids. For Poisson, GMG works great and you seem to have regular grids.
>> 
>>   Thanks,
>> 
>>     Matt 
>> Thanks for your help, 
>> 
>> 
>> 
>> Victoria
>> 
>> 
>> 
>> On 7/10/20 12:48, Matthew Knepley wrote:
>>> On Wed, Oct 7, 2020 at 6:40 AM Victoria Hamtiaux <victoria.hamtiaux at uclouvain.be <mailto:victoria.hamtiaux at uclouvain.be>> wrote:
>>> Dear all,
>>> 
>>> 
>>> After the discretization of a poisson equation with purely Neumann (or 
>>> periodic) boundary conditions, I get a matrix which is singular.
>>> 
>>> 
>>> The way I am handling this is by using a NullSpace with the following 
>>> code :
>>> 
>>> MatNullSpace nullspace;
>>> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, 0, &nullspace);
>>> MatSetNullSpace(p_solverp->A, nullspace);
>>> MatSetTransposeNullSpace(p_solverp->A, nullspace);
>>> MatNullSpaceDestroy(&nullspace);
>>> 
>>> 
>>> Note that I am using the hypre preconditionner BOOMERANG and the default 
>>> solver GMRES.
>>> 
>>> 
>>>      KSPCreate(PETSC_COMM_WORLD,&p_solverp->ksp);
>>>      KSPSetOperators(p_solverp->ksp, p_solverp->A, p_solverp->A);
>>>      PC prec;
>>>      KSPGetPC(p_solverp->ksp, &prec);
>>>      PCSetType(prec,PCHYPRE);//PCHYPRE seems the best
>>>      PCHYPRESetType(prec,"boomeramg"); //boomeramg is the best
>>>      KSPSetInitialGuessNonzero(p_solverp->ksp,PETSC_TRUE);
>>>      KSPSetFromOptions(p_solverp->ksp);
>>>      KSPSetTolerances(p_solverp->ksp, 1.e-10, 1e-10, PETSC_DEFAULT, 
>>> PETSC_DEFAULT);
>>>      KSPSetReusePreconditioner(p_solverp->ksp,PETSC_TRUE);
>>>      KSPSetUseFischerGuess(p_solverp->ksp,1,5);
>>>      KSPGMRESSetPreAllocateVectors(p_solverp->ksp);
>>>      KSPSetUp(p_solverp->ksp);
>>> 
>>> 
>>> 
>>> And this works fine when my grid is uniform, so that my matrix is 
>>> symmetric.
>>> 
>>> 
>>> But when I stretch the grid near the boundary (my grid is then 
>>> non-uniform), it doesn't work properly anymore. I suspect that by 
>>> stretching the grid, my matrix is not symmetric anymore and that it 
>>> might cause a problem.
>>> 
>>> Symmetry is a property of the operator, so you should be symmetric on your
>>> stretched grid. If not, I think you have the discretization wrong. You can check
>>> symmetry using
>>> 
>>>   https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatIsSymmetric.html <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fpetsc-current%2Fdocs%2Fmanualpages%2FMat%2FMatIsSymmetric.html&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C494a0f05bd214b5974f008d86abb6e02%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376700525925749&sdata=nWt0sejio7o1PzMoc7tPu7JOvcNqofRuMQ91ynW54r4%3D&reserved=0>
>>> 
>>> Also, if you suspect your discretization, you should probably do an MMS test to
>>> verify that you discretization converges at the correct rate.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>> I tried fixing the solution at an arbitrary point, but sometimes doing 
>>> this, I get errors near that fixed point. I 've seen on the petsc-users 
>>> forum that you usually don't recommend to fix a point, but I don't 
>>> really know how to proceed differently.
>>> 
>>> 
>>> What would you recommend to solve this problem?
>>> 
>>> 
>>> Thanks for your help,
>>> 
>>> 
>>> Best regards,
>>> 
>>> 
>>> Victoria
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <https://eur03.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C494a0f05bd214b5974f008d86abb6e02%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376700525925749&sdata=LHARUv3BxSwWnxN2LUJnX3vr2ZJ9f50EMQzw44Hy%2FqY%3D&reserved=0>
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <https://eur03.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C494a0f05bd214b5974f008d86abb6e02%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376700525935751&sdata=kOKe%2FLj7pvAdzldpTNlRfC7BS6Vv4S5mU6Cb8pPpmrE%3D&reserved=0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201007/f13277e5/attachment-0001.html>

From pranayreddy865 at gmail.com  Wed Oct  7 13:41:18 2020
From: pranayreddy865 at gmail.com (baikadi pranay)
Date: Wed, 7 Oct 2020 11:41:18 -0700
Subject: [petsc-users] Regarding FormFunction in the SNES class
Message-ID: <CA+zFCTmg=oh9oPqb-cWRdwqww8uZY1Qjb0tskRpyTTMfM-1ZCw@mail.gmail.com>

Hello,
I have a few questions regarding FormFunction when using the SNES solvers.
I am using Fortran90.

1) I went through the example (ex1f.F90) provided in the documentation that
uses Newton method to solve a two-variable system. In the subroutine
FormFunction, the first argument is an input vector (x). However in the
code, no attributes are specified saying that it is an input argument for
the subroutine (i.e. intent attribute is not specified). Is this
automatically taken care of or should I be defining the intent attribute in
my code ? Also, should I use the "allocatable" attribute when defining the
vector x? Please comment similarly on the output vector f as well.
2) Should the ctx argument of the subroutine FormFunction be defined as
"PETSC_NULL_INTEGER"?

Please let me know if you need any further information.

Thank you.
Best Regards,
Pranay.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201007/9a7d9c8a/attachment.html>

From bsmith at petsc.dev  Wed Oct  7 14:19:05 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 7 Oct 2020 14:19:05 -0500
Subject: [petsc-users] Regarding FormFunction in the SNES class
In-Reply-To: <CA+zFCTmg=oh9oPqb-cWRdwqww8uZY1Qjb0tskRpyTTMfM-1ZCw@mail.gmail.com>
References: <CA+zFCTmg=oh9oPqb-cWRdwqww8uZY1Qjb0tskRpyTTMfM-1ZCw@mail.gmail.com>
Message-ID: <9CBDA8FE-3399-43C9-88F4-23D66981E736@petsc.dev>


> On Oct 7, 2020, at 1:41 PM, baikadi pranay <pranayreddy865 at gmail.com> wrote:
> 
> Hello,
> I have a few questions regarding FormFunction when using the SNES solvers. I am using Fortran90. 
> 
> 1) I went through the example (ex1f.F90) provided in the documentation that uses Newton method to solve a two-variable system. In the subroutine FormFunction, the first argument is an input vector (x). However in the code, no attributes are specified saying that it is an input argument for the subroutine (i.e. intent attribute is not specified). Is this automatically taken care of or should I be defining the intent attribute in my code ?

  We don't currently provide attributes for our Fortran stubs, so it is best if you do not mark them in your subroutines. 

  Yes the x is input only and the f is output only.

> Also, should I use the "allocatable" attribute when defining the vector x?

   I am pretty sure no.

> Please comment similarly on the output vector f as well.
> 2) Should the ctx argument of the subroutine FormFunction be defined as "PETSC_NULL_INTEGER"?

   The context is how you convey additional information into FormFunction(). Should you choose to not use it then in your function you can declare it as a integer and simply not use it. If you are calling your FormFunction() from Fortran then just pass a meaningless integer as that argument.  PETSC_NULL_INTEGER is for call PETSc functions that take integer array arguments that you are not supplying.

  Barry


> 
> Please let me know if you need any further information. 
> 
> Thank you.
> Best Regards,
> Pranay.


From knepley at gmail.com  Wed Oct  7 17:32:02 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 7 Oct 2020 18:32:02 -0400
Subject: [petsc-users] Regarding FormFunction in the SNES class
In-Reply-To: <9CBDA8FE-3399-43C9-88F4-23D66981E736@petsc.dev>
References: <CA+zFCTmg=oh9oPqb-cWRdwqww8uZY1Qjb0tskRpyTTMfM-1ZCw@mail.gmail.com>
	<9CBDA8FE-3399-43C9-88F4-23D66981E736@petsc.dev>
Message-ID: <CAMYG4G=ZgEK_HOC6u3SSSOWoo-BBugL=j9gcUC9cp-07mQLurQ@mail.gmail.com>

On Wed, Oct 7, 2020 at 4:26 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> > On Oct 7, 2020, at 1:41 PM, baikadi pranay <pranayreddy865 at gmail.com>
> wrote:
> >
> > Hello,
> > I have a few questions regarding FormFunction when using the SNES
> solvers. I am using Fortran90.
> >
> > 1) I went through the example (ex1f.F90) provided in the documentation
> that uses Newton method to solve a two-variable system. In the subroutine
> FormFunction, the first argument is an input vector (x). However in the
> code, no attributes are specified saying that it is an input argument for
> the subroutine (i.e. intent attribute is not specified). Is this
> automatically taken care of or should I be defining the intent attribute in
> my code ?
>
>   We don't currently provide attributes for our Fortran stubs, so it is
> best if you do not mark them in your subroutines.
>
>   Yes the x is input only and the f is output only.
>

Are you certain? We do not change the f pointer, you change the data hiding
inside.

   Matt


> > Also, should I use the "allocatable" attribute when defining the vector
> x?
>
>    I am pretty sure no.
>
> > Please comment similarly on the output vector f as well.
> > 2) Should the ctx argument of the subroutine FormFunction be defined as
> "PETSC_NULL_INTEGER"?
>
>    The context is how you convey additional information into
> FormFunction(). Should you choose to not use it then in your function you
> can declare it as a integer and simply not use it. If you are calling your
> FormFunction() from Fortran then just pass a meaningless integer as that
> argument.  PETSC_NULL_INTEGER is for call PETSc functions that take integer
> array arguments that you are not supplying.
>
>   Barry
>
>
>
> >
> > Please let me know if you need any further information.
> >
> > Thank you.
> > Best Regards,
> > Pranay.
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201007/9a5e5d57/attachment.html>

From bsmith at petsc.dev  Wed Oct  7 20:33:47 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 7 Oct 2020 20:33:47 -0500
Subject: [petsc-users] Regarding FormFunction in the SNES class
In-Reply-To: <CA+zFCTniF9VniRQ0EgFWJovfMNk2DCzPb7CH7W+yYGpV6Cn_jQ@mail.gmail.com>
References: <CA+zFCTmg=oh9oPqb-cWRdwqww8uZY1Qjb0tskRpyTTMfM-1ZCw@mail.gmail.com>
	<9CBDA8FE-3399-43C9-88F4-23D66981E736@petsc.dev>
	<CA+zFCTniF9VniRQ0EgFWJovfMNk2DCzPb7CH7W+yYGpV6Cn_jQ@mail.gmail.com>
Message-ID: <7936C907-C274-4DE9-8A1B-2EF1F57C124F@petsc.dev>


> On Oct 7, 2020, at 5:02 PM, baikadi pranay <pranayreddy865 at gmail.com> wrote:
> 
> Thank you Barry for the response. Just to make sure I understood correctly, I do not need to define any attributes to the vectors and it is automatically taken care of (both the intent as well as the allocatable attributes). Am I correct?
> For the second question, my subroutine should look like this:
> FormFunction(snes,x,f,3,INTEGER_VARIABLE,ierr)
> Is this correct?
> 

   For example src/snes/tutorials/ex1f.F90

      subroutine FormFunction(snes,x,f,dummy,ierr)
      use petscsnes
      implicit none

      SNES     snes
      Vec      x,f
      PetscErrorCode ierr
      integer dummy(*)


> Thank you in advance.
> Best Regards,
> Pranay.
> 
> ?
> 
> On Wed, Oct 7, 2020 at 1:26 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
> 
> > On Oct 7, 2020, at 1:41 PM, baikadi pranay <pranayreddy865 at gmail.com <mailto:pranayreddy865 at gmail.com>> wrote:
> > 
> > Hello,
> > I have a few questions regarding FormFunction when using the SNES solvers. I am using Fortran90. 
> > 
> > 1) I went through the example (ex1f.F90) provided in the documentation that uses Newton method to solve a two-variable system. In the subroutine FormFunction, the first argument is an input vector (x). However in the code, no attributes are specified saying that it is an input argument for the subroutine (i.e. intent attribute is not specified). Is this automatically taken care of or should I be defining the intent attribute in my code ?
> 
>   We don't currently provide attributes for our Fortran stubs, so it is best if you do not mark them in your subroutines. 
> 
>   Yes the x is input only and the f is output only.
> 
> > Also, should I use the "allocatable" attribute when defining the vector x?
> 
>    I am pretty sure no.
> 
> > Please comment similarly on the output vector f as well.
> > 2) Should the ctx argument of the subroutine FormFunction be defined as "PETSC_NULL_INTEGER"?
> 
>    The context is how you convey additional information into FormFunction(). Should you choose to not use it then in your function you can declare it as a integer and simply not use it. If you are calling your FormFunction() from Fortran then just pass a meaningless integer as that argument.  PETSC_NULL_INTEGER is for call PETSc functions that take integer array arguments that you are not supplying.
> 
>   Barry
> 
> 
> 
> > 
> > Please let me know if you need any further information. 
> > 
> > Thank you.
> > Best Regards,
> > Pranay.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201007/d5c165d4/attachment.html>

From Y.Juntao at hotmail.com  Thu Oct  8 02:43:07 2020
From: Y.Juntao at hotmail.com (Yang Juntao)
Date: Thu, 8 Oct 2020 07:43:07 +0000
Subject: [petsc-users] Convergence Error Debugging with KSP solvers in SNES
Message-ID: <TY2PR04MB3933829C2EB1F45A9AC8E0A7930B0@TY2PR04MB3933.apcprd04.prod.outlook.com>

Hello,

I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver.
But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error.

From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration.
Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem.

iter = 0, SNES Function norm 2.94934e-06
iteration 0 KSP Residual norm 1.094600281831e-06
iteration 1 KSP Residual norm 1.264284474186e-08
iteration 2 KSP Residual norm 6.593269221816e-09
iteration 3 KSP Residual norm 1.689570779457e-09
iteration 4 KSP Residual norm 1.040661505932e-09
iteration 5 KSP Residual norm 5.422761817348e-10
iteration 6 KSP Residual norm 2.492867371369e-10
iteration 7 KSP Residual norm 8.261522376775e-11
iteration 8 KSP Residual norm 4.246401544245e-11
iteration 9 KSP Residual norm 2.514366787388e-11
iteration 10 KSP Residual norm 1.982940267051e-11
iteration 11 KSP Residual norm 1.586470414676e-11
iteration 12 KSP Residual norm 9.866392216207e-12
iteration 13 KSP Residual norm 4.951342176999e-12
iteration 14 KSP Residual norm 2.418292660318e-12
iteration 15 KSP Residual norm 1.747418526086e-12
iteration 16 KSP Residual norm 1.094150535809e-12
iteration 17 KSP Residual norm 4.464287492066e-13
iteration 18 KSP Residual norm 3.530090494462e-13
iteration 19 KSP Residual norm 2.825698091454e-13
iteration 20 KSP Residual norm 1.950568425807e-13
iteration 21 KSP Residual norm 1.227898091813e-13
iteration 22 KSP Residual norm 5.411106347374e-14
iteration 23 KSP Residual norm 4.511115848564e-14
iteration 24 KSP Residual norm 4.063546606691e-14
iteration 25 KSP Residual norm 3.677694771949e-14
iteration 26 KSP Residual norm 3.459244943466e-14
iteration 27 KSP Residual norm 3.263954971093e-14
iteration 28 KSP Residual norm 3.087344619079e-14
iteration 29 KSP Residual norm 2.809426925625e-14
iteration 30 KSP Residual norm 4.366149884754e-01
  Linear solve did not converge due to DIVERGED_DTOL iterations 30


SNES Object: 1 MPI processes
  type: newtonls
  SNES has not been set up so information may be incomplete
  maximum iterations=50, maximum function evaluations=10000
  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
  total number of linear solver iterations=0
  total number of function evaluations=0
  norm schedule ALWAYS
  SNESLineSearch Object: 1 MPI processes
    type: bt
      interpolation: cubic
      alpha=1.000000e-04
    maxstep=1.000000e+08, minlambda=1.000000e-12
    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
    maximum iterations=40
  KSP Object: 1 MPI processes
    type: gmres
      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
      happy breakdown tolerance 1e-30
    maximum iterations=10000, initial guess is zero
    tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
    left preconditioning
    using DEFAULT norm type for convergence test
  PC Object: 1 MPI processes
    type: fieldsplit
    PC has not been set up so information may be incomplete
      FieldSplit with Schur preconditioner, factorization FULL
      Preconditioner for the Schur complement formed from S itself
      Split info:
      KSP solver for A00 block
          not yet available
      KSP solver for S = A11 - A10 inv(A00) A01
          not yet available
    linear system matrix = precond matrix:
    Mat Object: 1 MPI processes
      type: seqaij
      rows=659, cols=659
      total: nonzeros=659, allocated nonzeros=7908
      total number of mallocs used during MatSetValues calls=0
        not using I-node routines

Regards
Juntao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201008/5c9cfcac/attachment-0001.html>

From knepley at gmail.com  Thu Oct  8 05:25:19 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 8 Oct 2020 06:25:19 -0400
Subject: [petsc-users] Convergence Error Debugging with KSP solvers in
 SNES
In-Reply-To: <TY2PR04MB3933829C2EB1F45A9AC8E0A7930B0@TY2PR04MB3933.apcprd04.prod.outlook.com>
References: <TY2PR04MB3933829C2EB1F45A9AC8E0A7930B0@TY2PR04MB3933.apcprd04.prod.outlook.com>
Message-ID: <CAMYG4Gka4k+cgEoQ4Ad-aR49NHXBQBXdPfzigsGjQecwaObs8w@mail.gmail.com>

On Thu, Oct 8, 2020 at 3:43 AM Yang Juntao <Y.Juntao at hotmail.com> wrote:

> Hello,
>
>
>
> I?m working on a nonlinear solver with SNES with handcoded jacobian and
> function. Each linear solver is solved with KSP solver.
>
> But sometimes I got issues with ksp solver convergence. I tried with
> finite difference approximated jacobian, but get the same error.
>
>
>
> From the iterations, the convergence seems ok at the beginning but
> suddenly diverged in the last iteration.
>
> Hope anyone with experience on ksp solvers could direct me to a direction
> I can debug the problem.
>

KSP Object: 1 MPI processes
    type: gmres
      restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement

The GMRES restarted at iteration 30. You can increase the subspace size
using

  -ksp_gmres_restart 100

  Thanks,

     Matt


> iter = 0, SNES Function norm 2.94934e-06
>
> iteration 0 KSP Residual norm 1.094600281831e-06
>
> iteration 1 KSP Residual norm 1.264284474186e-08
>
> iteration 2 KSP Residual norm 6.593269221816e-09
>
> iteration 3 KSP Residual norm 1.689570779457e-09
>
> iteration 4 KSP Residual norm 1.040661505932e-09
>
> iteration 5 KSP Residual norm 5.422761817348e-10
>
> iteration 6 KSP Residual norm 2.492867371369e-10
>
> iteration 7 KSP Residual norm 8.261522376775e-11
>
> iteration 8 KSP Residual norm 4.246401544245e-11
>
> iteration 9 KSP Residual norm 2.514366787388e-11
>
> iteration 10 KSP Residual norm 1.982940267051e-11
>
> iteration 11 KSP Residual norm 1.586470414676e-11
>
> iteration 12 KSP Residual norm 9.866392216207e-12
>
> iteration 13 KSP Residual norm 4.951342176999e-12
>
> iteration 14 KSP Residual norm 2.418292660318e-12
>
> iteration 15 KSP Residual norm 1.747418526086e-12
>
> iteration 16 KSP Residual norm 1.094150535809e-12
>
> iteration 17 KSP Residual norm 4.464287492066e-13
>
> iteration 18 KSP Residual norm 3.530090494462e-13
>
> iteration 19 KSP Residual norm 2.825698091454e-13
>
> iteration 20 KSP Residual norm 1.950568425807e-13
>
> iteration 21 KSP Residual norm 1.227898091813e-13
>
> iteration 22 KSP Residual norm 5.411106347374e-14
>
> iteration 23 KSP Residual norm 4.511115848564e-14
>
> iteration 24 KSP Residual norm 4.063546606691e-14
>
> iteration 25 KSP Residual norm 3.677694771949e-14
>
> iteration 26 KSP Residual norm 3.459244943466e-14
>
> iteration 27 KSP Residual norm 3.263954971093e-14
>
> iteration 28 KSP Residual norm 3.087344619079e-14
>
> iteration 29 KSP Residual norm 2.809426925625e-14
>
> iteration 30 KSP Residual norm 4.366149884754e-01
>
>   Linear solve did not converge due to DIVERGED_DTOL iterations 30
>
>
>
>
>
> SNES Object: 1 MPI processes
>
>   type: newtonls
>
>   SNES has not been set up so information may be incomplete
>
>   maximum iterations=50, maximum function evaluations=10000
>
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>
>   total number of linear solver iterations=0
>
>   total number of function evaluations=0
>
>   norm schedule ALWAYS
>
>   SNESLineSearch Object: 1 MPI processes
>
>     type: bt
>
>       interpolation: cubic
>
>       alpha=1.000000e-04
>
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
>
>     maximum iterations=40
>
>   KSP Object: 1 MPI processes
>
>     type: gmres
>
>       restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>
>       happy breakdown tolerance 1e-30
>
>     maximum iterations=10000, initial guess is zero
>
>     tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>
>     left preconditioning
>
>     using DEFAULT norm type for convergence test
>
>   PC Object: 1 MPI processes
>
>     type: fieldsplit
>
>     PC has not been set up so information may be incomplete
>
>       FieldSplit with Schur preconditioner, factorization FULL
>
>       Preconditioner for the Schur complement formed from S itself
>
>       Split info:
>
>       KSP solver for A00 block
>
>           not yet available
>
>       KSP solver for S = A11 - A10 inv(A00) A01
>
>           not yet available
>
>     linear system matrix = precond matrix:
>
>     Mat Object: 1 MPI processes
>
>       type: seqaij
>
>       rows=659, cols=659
>
>       total: nonzeros=659, allocated nonzeros=7908
>
>       total number of mallocs used during MatSetValues calls=0
>
>         not using I-node routines
>
>
>
> Regards
>
> Juntao
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201008/d773df38/attachment.html>

From olivier.jamond at cea.fr  Thu Oct  8 08:27:54 2020
From: olivier.jamond at cea.fr (Olivier Jamond)
Date: Thu, 8 Oct 2020 15:27:54 +0200
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
	<358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
Message-ID: <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr>


>     Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)).
>
>      Sp = Ct*C  + Qt * S * Q = Ct*C  +  [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]
>
> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks.
>
> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]  then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants.   You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD.

At this time, I didn't manage to compute iD=inv(C*Ct) without using 
dense matrices, what may be a shame because all matrices are sparse . Is 
it possible?

And I get no idea of how to write code to manually zip through the 
diagonal blocks of D to invert them...

Thanks for helping!


From bsmith at petsc.dev  Thu Oct  8 13:17:46 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 8 Oct 2020 13:17:46 -0500
Subject: [petsc-users] Convergence Error Debugging with KSP solvers in
 SNES
In-Reply-To: <TY2PR04MB3933829C2EB1F45A9AC8E0A7930B0@TY2PR04MB3933.apcprd04.prod.outlook.com>
References: <TY2PR04MB3933829C2EB1F45A9AC8E0A7930B0@TY2PR04MB3933.apcprd04.prod.outlook.com>
Message-ID: <17D7396D-BA2D-4E7A-859A-82F605E6D12D@petsc.dev>


  When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner. 

  How are you doing the matrix vector product?   Note both the operator and preconditioner must be linear operators for GMRES.

  FGMRES allows the preconditioner to be nonlinear. You can try

  -ksp_type fgmres -ksp_monitor_true_residual

   Barry


> On Oct 8, 2020, at 2:43 AM, Yang Juntao <Y.Juntao at hotmail.com> wrote:
> 
> Hello, 
>  
> I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver.
> But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error.
>  
> From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration.
> Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem.
>  
> iter = 0, SNES Function norm 2.94934e-06
> iteration 0 KSP Residual norm 1.094600281831e-06
> iteration 1 KSP Residual norm 1.264284474186e-08
> iteration 2 KSP Residual norm 6.593269221816e-09
> iteration 3 KSP Residual norm 1.689570779457e-09
> iteration 4 KSP Residual norm 1.040661505932e-09
> iteration 5 KSP Residual norm 5.422761817348e-10
> iteration 6 KSP Residual norm 2.492867371369e-10
> iteration 7 KSP Residual norm 8.261522376775e-11
> iteration 8 KSP Residual norm 4.246401544245e-11
> iteration 9 KSP Residual norm 2.514366787388e-11
> iteration 10 KSP Residual norm 1.982940267051e-11
> iteration 11 KSP Residual norm 1.586470414676e-11
> iteration 12 KSP Residual norm 9.866392216207e-12
> iteration 13 KSP Residual norm 4.951342176999e-12
> iteration 14 KSP Residual norm 2.418292660318e-12
> iteration 15 KSP Residual norm 1.747418526086e-12
> iteration 16 KSP Residual norm 1.094150535809e-12
> iteration 17 KSP Residual norm 4.464287492066e-13
> iteration 18 KSP Residual norm 3.530090494462e-13
> iteration 19 KSP Residual norm 2.825698091454e-13
> iteration 20 KSP Residual norm 1.950568425807e-13
> iteration 21 KSP Residual norm 1.227898091813e-13
> iteration 22 KSP Residual norm 5.411106347374e-14
> iteration 23 KSP Residual norm 4.511115848564e-14
> iteration 24 KSP Residual norm 4.063546606691e-14
> iteration 25 KSP Residual norm 3.677694771949e-14
> iteration 26 KSP Residual norm 3.459244943466e-14
> iteration 27 KSP Residual norm 3.263954971093e-14
> iteration 28 KSP Residual norm 3.087344619079e-14
> iteration 29 KSP Residual norm 2.809426925625e-14
> iteration 30 KSP Residual norm 4.366149884754e-01
>   Linear solve did not converge due to DIVERGED_DTOL iterations 30
>  
>  
> SNES Object: 1 MPI processes
>   type: newtonls
>   SNES has not been set up so information may be incomplete
>   maximum iterations=50, maximum function evaluations=10000
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=0
>   norm schedule ALWAYS
>   SNESLineSearch Object: 1 MPI processes
>     type: bt
>       interpolation: cubic
>       alpha=1.000000e-04
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
>     maximum iterations=40
>   KSP Object: 1 MPI processes
>     type: gmres
>       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using DEFAULT norm type for convergence test
>   PC Object: 1 MPI processes
>     type: fieldsplit
>     PC has not been set up so information may be incomplete
>       FieldSplit with Schur preconditioner, factorization FULL
>       Preconditioner for the Schur complement formed from S itself
>       Split info:
>       KSP solver for A00 block
>           not yet available
>       KSP solver for S = A11 - A10 inv(A00) A01
>           not yet available
>     linear system matrix = precond matrix:
>     Mat Object: 1 MPI processes
>       type: seqaij
>       rows=659, cols=659
>       total: nonzeros=659, allocated nonzeros=7908
>       total number of mallocs used during MatSetValues calls=0
>         not using I-node routines
>  
> Regards
> Juntao

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201008/2330e7b5/attachment.html>

From jed at jedbrown.org  Thu Oct  8 15:07:12 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 08 Oct 2020 14:07:12 -0600
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
	<358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
	<7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr>
Message-ID: <87tuv48osv.fsf@jedbrown.org>

Olivier Jamond <olivier.jamond at cea.fr> writes:

>>     Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)).
>>
>>      Sp = Ct*C  + Qt * S * Q = Ct*C  +  [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]
>>
>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks.
>>
>> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]  then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants.   You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD.
>
> At this time, I didn't manage to compute iD=inv(C*Ct) without using 
> dense matrices, what may be a shame because all matrices are sparse . Is 
> it possible?
>
> And I get no idea of how to write code to manually zip through the 
> diagonal blocks of D to invert them...

You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array.

If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat.

From bsmith at petsc.dev  Thu Oct  8 15:59:51 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 8 Oct 2020 15:59:51 -0500
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <87tuv48osv.fsf@jedbrown.org>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
	<358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
	<7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr>
	<87tuv48osv.fsf@jedbrown.org>
Message-ID: <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev>


  Olivier

  I am working on extending the routines now and hopefully push a branch you can try fairly soon.

  Barry


> On Oct 8, 2020, at 3:07 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Olivier Jamond <olivier.jamond at cea.fr> writes:
> 
>>>    Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)).
>>> 
>>>     Sp = Ct*C  + Qt * S * Q = Ct*C  +  [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]
>>> 
>>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks.
>>> 
>>> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]  then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants.   You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD.
>> 
>> At this time, I didn't manage to compute iD=inv(C*Ct) without using 
>> dense matrices, what may be a shame because all matrices are sparse . Is 
>> it possible?
>> 
>> And I get no idea of how to write code to manually zip through the 
>> diagonal blocks of D to invert them...
> 
> You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array.
> 
> If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat.


From bsmith at petsc.dev  Thu Oct  8 20:50:00 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 8 Oct 2020 20:50:00 -0500
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
	<358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
	<7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr>
	<87tuv48osv.fsf@jedbrown.org>
	<2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev>
Message-ID: <218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev>


  Olivier,

    The branch barry/2020-10-08/invert-block-diagonal-aij contains an example src/mat/tests/ex178.c that shows how to compute inv(CC'). It works for SeqAIJ matrices.

    Please let us know if it works for you and then I will implement the parallel version.

  Barry


> On Oct 8, 2020, at 3:59 PM, Barry Smith <bsmith at petsc.dev> wrote:
> 
> 
>  Olivier
> 
>  I am working on extending the routines now and hopefully push a branch you can try fairly soon.
> 
>  Barry
> 
> 
>> On Oct 8, 2020, at 3:07 PM, Jed Brown <jed at jedbrown.org> wrote:
>> 
>> Olivier Jamond <olivier.jamond at cea.fr> writes:
>> 
>>>>   Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)).
>>>> 
>>>>    Sp = Ct*C  + Qt * S * Q = Ct*C  +  [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]
>>>> 
>>>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks.
>>>> 
>>>> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]  then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants.   You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD.
>>> 
>>> At this time, I didn't manage to compute iD=inv(C*Ct) without using 
>>> dense matrices, what may be a shame because all matrices are sparse . Is 
>>> it possible?
>>> 
>>> And I get no idea of how to write code to manually zip through the 
>>> diagonal blocks of D to invert them...
>> 
>> You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array.
>> 
>> If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201008/a50c74b7/attachment.html>

From bsmith at petsc.dev  Thu Oct  8 21:31:12 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 8 Oct 2020 21:31:12 -0500
Subject: [petsc-users] [petsc-maint] Poisson with Neumann BC on a
 stretched grid
In-Reply-To: <260f02c6-64e1-e12e-82e7-f4ea7a155ca1@uclouvain.be>
References: <982e1f8e-8b74-8b69-c7ab-fa7eeec650a6@uclouvain.be>
	<CAMYG4Gk+YJmaV9-7YCiQRRy5eZ3aXTjzooCTGTRSp6sZeZmtPw@mail.gmail.com>
	<c9131cb1-28bb-27ba-f6b6-292545b0c17a@uclouvain.be>
	<CAMYG4GmvWPdDxLshaF6vbbcW_JHk12uAGTTfe6cbqYuEyuppnQ@mail.gmail.com>
	<CADOhEh5Z7v-8ByqLKK_7YUyRc_j-fzp=5R7SwR+q6MHr7M33+g@mail.gmail.com>
	<b3187774-5979-185a-7d00-4e2e360374f4@uclouvain.be>
	<CADOhEh4Lm_b9wq-k=-vLO8PryCswM0737=EuhtSrznFEdUp85g@mail.gmail.com>
	<DB36A339-6908-49FD-9D69-4FF0918EA893@petsc.dev>
	<260f02c6-64e1-e12e-82e7-f4ea7a155ca1@uclouvain.be>
Message-ID: <35425AFC-F019-4704-939C-7AD05B358DE4@petsc.dev>


> On Oct 8, 2020, at 2:50 AM, Victoria Hamtiaux <victoria.hamtiaux at uclouvain.be> wrote:
> 
> I'm sorry but I'm a bit confused. 
> 
> 
> 
> First, the fact that the matrix is not symmetric is ok?
> 
> 
> 
> Secondly, I guess that coding the semi-coarsening would be the "best" solution, but isn't there any "easier" solution to solve that linear system (Poisson equation with pure Neumann BC on a stretched grid (and in parallel))? 
> 
> 
> 
> Also, is it normal that using the PCLU preconditionner (and solving on 1 processor) with a stretched grid,  the solution of the linear solver is bad. Is it possible that the PCLU preconditionner also has problems with stretched grids? (Because again, with a uniform grid, it works fine).
> 

  No this is not normal. I would expect PCLU to be very robust for stretched grids. 

  With finite differences I think a stretched grid results in a non symmetric matrix because the differencing (to the right and left for example) uses a different h. I think this is normal, with traditional finite elements I think it will remain symmetric. Also with finite differencing the order of the accuracy of the discretization falls because the terms in the Taylor series that cancel with a non-stretched grid do not cancel with a stretched grid.

   Is it possible that something is wrong with generation of the matrix with the stretched grid?  You could try a convergence study with a slightly stretched grid using PCLU to see if it seems to be working correctly. Only when the numerics are right and you want to run a large problem where PCLU is too slow you would switch to multgrid with semi-coarsening. I think it is pretty easy for a structured grid and we can show you how and maybe get a nice example for PETSc out of it.

  Barry


> 
> Sorry for asking so much questions, 
> 
> 
> 
> Thanks again for your help, 
> 
> 
> 
> Best regards, 
> 
> 
> 
> Victoria
> 
> 
> 
> 
> 
> On 7/10/20 17:59, Barry Smith wrote:
>> 
>> 
>>> On Oct 7, 2020, at 8:11 AM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
>>> 
>>> 
>>> On Wed, Oct 7, 2020 at 8:27 AM Victoria Hamtiaux <victoria.hamtiaux at uclouvain.be <mailto:victoria.hamtiaux at uclouvain.be>> wrote:
>>> Thanks for all the answers,
>>> 
>>> 
>>> 
>>> How can I do the "semi-coarsening"? I don't really know how those preconditionners work so I don't how how to change them or so..
>>> 
>>> 
>>> 
>>> You have to write custom code to do semi-coarsening. PETSc does not provide that and you would not want to do it yourself, most likely.
>> 
>>   We do not provide it directly but if you are using PCMG and DMDA it is relatively straight-forward. You create a coarse DM and then refine it but each refinement you only do in the directions you want set each time with DMDASetRefinementFactor(). Once you have the collections of refined DM's you provide them to PCMG.
>> 
>>   Barry
>> 
>>>  
>>> 
>>> I have a question because you both seem to say that my matrix is supposed to be symmetric which is not the case. \
>>> 
>>> You said "my matrix is symmetric." 
>>> 
>>> Then you said " I suspect that by stretching the grid, my matrix is not symmetric anymore and that it might cause a problem."
>>> 
>>> We are saying that by stretchin the grid the matrix is still symmetric even if the grid has lost a symmetry. I don't know of a mechanism for stretching the grid to make the matrix asymmetric. So we are suggesting that you verify your suspicion that the matrix is symmetric.
>>> 
>>> And in fact, I don't get how it can be symmetric. Because you will have something close to symmetric. For example when you are at the center of your domain it will be symmetric, but when your at a point at the boundaries I don't get how you can be symmetric, you won't have something at the left and the right of your main diagonal... (I don't know if my explanations are understandable)
>>> 
>>> You can make a discretization that is not symmetric because of boundary conditions but I assume that is not the case because you said your matrix is symmetric.
>>>  
>>> Best regards, 
>>> 
>>> 
>>> 
>>> Victoria
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 7/10/20 14:20, Mark Adams wrote:
>>>> GMG (geometric MG) is stronger as Matt said, but it is affected by stretched grids in a similar way. A way to fix this in GMG is semi-coarsening, which AMG _can_ do automatically.
>>>> 
>>>> On Wed, Oct 7, 2020 at 8:02 AM Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>>> On Wed, Oct 7, 2020 at 7:07 AM Victoria Hamtiaux <victoria.hamtiaux at uclouvain.be <mailto:victoria.hamtiaux at uclouvain.be>> wrote:
>>>> Hello Matt, 
>>>> 
>>>> 
>>>> 
>>>> I just checked the symmetry of my matrix and it is not symmetric. But it is not symmetric either when I use a uniform grid.
>>>> 
>>>> The domain is 3D and I'm using finite differences, so I guess it is normal that at multiple places (when I deal with points near the boundaries), the matrix is not symmetric.
>>>> 
>>>> So I was wrong, the problem doesn't come from the fact that the matrix is not symmetric. I don't know where it comes from, but when I switch from uniform to stretched grid, the solver stops working properly. Could it be from the preconditionner of the solver that I use?
>>>> 
>>>> Do you have any other idea ? 
>>>> 
>>>> I would consider using GMG. As Mark says, AMG is very fiddly with stretched grids. For Poisson, GMG works great and you seem to have regular grids.
>>>> 
>>>>   Thanks,
>>>> 
>>>>     Matt 
>>>> Thanks for your help, 
>>>> 
>>>> 
>>>> 
>>>> Victoria
>>>> 
>>>> 
>>>> 
>>>> On 7/10/20 12:48, Matthew Knepley wrote:
>>>>> On Wed, Oct 7, 2020 at 6:40 AM Victoria Hamtiaux <victoria.hamtiaux at uclouvain.be <mailto:victoria.hamtiaux at uclouvain.be>> wrote:
>>>>> Dear all,
>>>>> 
>>>>> 
>>>>> After the discretization of a poisson equation with purely Neumann (or 
>>>>> periodic) boundary conditions, I get a matrix which is singular.
>>>>> 
>>>>> 
>>>>> The way I am handling this is by using a NullSpace with the following 
>>>>> code :
>>>>> 
>>>>> MatNullSpace nullspace;
>>>>> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, 0, &nullspace);
>>>>> MatSetNullSpace(p_solverp->A, nullspace);
>>>>> MatSetTransposeNullSpace(p_solverp->A, nullspace);
>>>>> MatNullSpaceDestroy(&nullspace);
>>>>> 
>>>>> 
>>>>> Note that I am using the hypre preconditionner BOOMERANG and the default 
>>>>> solver GMRES.
>>>>> 
>>>>> 
>>>>>      KSPCreate(PETSC_COMM_WORLD,&p_solverp->ksp);
>>>>>      KSPSetOperators(p_solverp->ksp, p_solverp->A, p_solverp->A);
>>>>>      PC prec;
>>>>>      KSPGetPC(p_solverp->ksp, &prec);
>>>>>      PCSetType(prec,PCHYPRE);//PCHYPRE seems the best
>>>>>      PCHYPRESetType(prec,"boomeramg"); //boomeramg is the best
>>>>>      KSPSetInitialGuessNonzero(p_solverp->ksp,PETSC_TRUE);
>>>>>      KSPSetFromOptions(p_solverp->ksp);
>>>>>      KSPSetTolerances(p_solverp->ksp, 1.e-10, 1e-10, PETSC_DEFAULT, 
>>>>> PETSC_DEFAULT);
>>>>>      KSPSetReusePreconditioner(p_solverp->ksp,PETSC_TRUE);
>>>>>      KSPSetUseFischerGuess(p_solverp->ksp,1,5);
>>>>>      KSPGMRESSetPreAllocateVectors(p_solverp->ksp);
>>>>>      KSPSetUp(p_solverp->ksp);
>>>>> 
>>>>> 
>>>>> 
>>>>> And this works fine when my grid is uniform, so that my matrix is 
>>>>> symmetric.
>>>>> 
>>>>> 
>>>>> But when I stretch the grid near the boundary (my grid is then 
>>>>> non-uniform), it doesn't work properly anymore. I suspect that by 
>>>>> stretching the grid, my matrix is not symmetric anymore and that it 
>>>>> might cause a problem.
>>>>> 
>>>>> Symmetry is a property of the operator, so you should be symmetric on your
>>>>> stretched grid. If not, I think you have the discretization wrong. You can check
>>>>> symmetry using
>>>>> 
>>>>>   https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatIsSymmetric.html <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fpetsc-current%2Fdocs%2Fmanualpages%2FMat%2FMatIsSymmetric.html&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C11ef103559a445e09bea08d86ada0450%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376831896868460&sdata=fTsO10YkMdghL%2BW%2FAxnZieUN5mfkwPgcfQJXe7q3Is8%3D&reserved=0>
>>>>> 
>>>>> Also, if you suspect your discretization, you should probably do an MMS test to
>>>>> verify that you discretization converges at the correct rate.
>>>>> 
>>>>>   Thanks,
>>>>> 
>>>>>      Matt
>>>>>  
>>>>> I tried fixing the solution at an arbitrary point, but sometimes doing 
>>>>> this, I get errors near that fixed point. I 've seen on the petsc-users 
>>>>> forum that you usually don't recommend to fix a point, but I don't 
>>>>> really know how to proceed differently.
>>>>> 
>>>>> 
>>>>> What would you recommend to solve this problem?
>>>>> 
>>>>> 
>>>>> Thanks for your help,
>>>>> 
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> 
>>>>> Victoria
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>> -- Norbert Wiener
>>>>> 
>>>>> https://www.cse.buffalo.edu/~knepley/ <https://eur03.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C11ef103559a445e09bea08d86ada0450%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376831896868460&sdata=VcIYJ8I3ObuuntK6fQz8XYL%2BjrzmFcw01TzLjAcvQnA%3D&reserved=0>
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/ <https://eur03.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=02%7C01%7Cvictoria.hamtiaux%40uclouvain.be%7C11ef103559a445e09bea08d86ada0450%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637376831896878459&sdata=LZGtk6DGsCTXuSa0AGOAxMbiU%2FT1dTHMzomo6NsaDEw%3D&reserved=0>
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201008/8bca9039/attachment-0001.html>

From y.juntao at hotmail.com  Fri Oct  9 00:53:28 2020
From: y.juntao at hotmail.com (Karl Yang)
Date: Fri, 9 Oct 2020 13:53:28 +0800
Subject: [petsc-users] Convergence Error Debugging with KSP solvers in
 SNES
In-Reply-To: <17D7396D-BA2D-4E7A-859A-82F605E6D12D@petsc.dev>
References: <17D7396D-BA2D-4E7A-859A-82F605E6D12D@petsc.dev>
Message-ID: <TY2PR04MB3933D7095DED7DE33E59F09193080@TY2PR04MB3933.apcprd04.prod.outlook.com>

Hi, Barry,

Thanks for your reply. Yes, I should have used fgmres. But after switching to fgmres I'm still facing the same convergence issue.
Seems like the reason is due to DIVERGED_PC_FAILED. But I simply used FD jacobian, and fieldsplitPC. I am a bit lost on whether I made some mistakes somewhere in the FormFunction or I did not setup the solver correctly.
///////code///////
SNESSetFunction(snes, r, FormFunctionStatic, this);
// SNESSetJacobian(snes, J, J, FormJacobianStatic, this);
SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault, this);
SNESMonitorSet(snes, MySNESMonitor, NULL, NULL);

SNESGetKSP(snes, &ksp);
KSPGetPC(ksp, &pc);
PCSetType(pc, PCFIELDSPLIT);
PCFieldSplitSetDetectSaddlePoint(pc, PETSC_TRUE);
PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELF, NULL);
KSPMonitorSet(ksp, MyKSPMonitor, NULL, 0);
KSPSetTolerances(ksp, 1e-8, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
SNESSetFromOptions(snes);
//////end/////////

Output from SNES/KSP solver
################# step 1 #################
iter = 0, SNES Function norm 0.0430713
iteration 0 KSP Residual norm 4.307133784528e-02
0 KSP unpreconditioned resid norm 4.307133784528e-02 true resid norm 4.307133784528e-02 ||r(i)||/||b|| 1.000000000000e+00
iteration 1 KSP Residual norm 4.451434065870e-07
1 KSP unpreconditioned resid norm 4.451434065870e-07 true resid norm 4.451434065902e-07 ||r(i)||/||b|| 1.033502623460e-05
iteration 2 KSP Residual norm 1.079756105012e-12
2 KSP unpreconditioned resid norm 1.079756105012e-12 true resid norm 1.079754870815e-12 ||r(i)||/||b|| 2.506898844643e-11
Linear solve converged due to CONVERGED_RTOL iterations 2
iter = 1, SNES Function norm 2.40846e-05
iteration 0 KSP Residual norm 2.408462930023e-05
0 KSP unpreconditioned resid norm 2.408462930023e-05 true resid norm 2.408462930023e-05 ||r(i)||/||b|| 1.000000000000e+00
iteration 1 KSP Residual norm 1.096958085415e-11
1 KSP unpreconditioned resid norm 1.096958085415e-11 true resid norm 1.096958085425e-11 ||r(i)||/||b|| 4.554598170270e-07
iteration 2 KSP Residual norm 5.909523288165e-16
2 KSP unpreconditioned resid norm 5.909523288165e-16 true resid norm 5.909519599233e-16 ||r(i)||/||b|| 2.453647729249e-11
Linear solve converged due to CONVERGED_RTOL iterations 2
iter = 2, SNES Function norm 1.19684e-14
################# step 2 #################
iter = 0, SNES Function norm 0.00391662
iteration 0 KSP Residual norm 3.916615614134e-03
0 KSP unpreconditioned resid norm 3.916615614134e-03 true resid norm 3.916615614134e-03 ||r(i)||/||b|| 1.000000000000e+00
iteration 1 KSP Residual norm 4.068800385009e-08
1 KSP unpreconditioned resid norm 4.068800385009e-08 true resid norm 4.068800384986e-08 ||r(i)||/||b|| 1.038856192653e-05
iteration 2 KSP Residual norm 8.427513055511e-14
2 KSP unpreconditioned resid norm 8.427513055511e-14 true resid norm 8.427497502034e-14 ||r(i)||/||b|| 2.151729537007e-11
Linear solve converged due to CONVERGED_RTOL iterations 2
iter = 1, SNES Function norm 1.99152e-07
iteration 0 KSP Residual norm 1.991523558528e-07
0 KSP unpreconditioned resid norm 1.991523558528e-07 true resid norm 1.991523558528e-07 ||r(i)||/||b|| 1.000000000000e+00
iteration 1 KSP Residual norm 1.413505562549e-13
1 KSP unpreconditioned resid norm 1.413505562549e-13 true resid norm 1.413505562550e-13 ||r(i)||/||b|| 7.097609046588e-07
iteration 2 KSP Residual norm 5.165934822520e-18
2 KSP unpreconditioned resid norm 5.165934822520e-18 true resid norm 5.165932973227e-18 ||r(i)||/||b|| 2.593960262787e-11
Linear solve converged due to CONVERGED_RTOL iterations 2
iter = 2, SNES Function norm 1.69561e-16
################# step 3 #################
iter = 0, SNES Function norm 0.00035615
iteration 0 KSP Residual norm 3.561504844171e-04
0 KSP unpreconditioned resid norm 3.561504844171e-04 true resid norm 3.561504844171e-04 ||r(i)||/||b|| 1.000000000000e+00
iteration 1 KSP Residual norm 3.701591890269e-09
1 KSP unpreconditioned resid norm 3.701591890269e-09 true resid norm 3.701591890274e-09 ||r(i)||/||b|| 1.039333667153e-05
iteration 2 KSP Residual norm 7.832821034843e-15
2 KSP unpreconditioned resid norm 7.832821034843e-15 true resid norm 7.832856926692e-15 ||r(i)||/||b|| 2.199311041093e-11
Linear solve converged due to CONVERGED_RTOL iterations 2
iter = 1, SNES Function norm 1.64671e-09
iteration 0 KSP Residual norm 1.646709543241e-09
0 KSP unpreconditioned resid norm 1.646709543241e-09 true resid norm 1.646709543241e-09 ||r(i)||/||b|| 1.000000000000e+00
iteration 1 KSP Residual norm 1.043230469512e-15
1 KSP unpreconditioned resid norm 1.043230469512e-15 true resid norm 1.043230469512e-15 ||r(i)||/||b|| 6.335242749968e-07
iteration 1 KSP Residual norm 0.000000000000e+00
1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan
Linear solve did not converge due to DIVERGED_PC_FAILED iterations 1
PC_FAILED due to SUBPC_ERROR

More information from -ksp_error_if_not_converged -info
[0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.303168180659e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 7.795816360977e-02 at iteration 12
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.227610512466e+00 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 5.453050347652e+05 at iteration 12
[0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
[0] KSPConvergedDefault(): Linear solver is diverging. Initial right hand size norm 9.501675075823e-01, current residual norm 4.894880836662e+04 at iteration 210
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR:
[0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020
[0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020
[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda
[0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c
[0]PETSC ERROR: #3 PCApply() line 444 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #4 KSP_PCApply() line 281 in /usr/local/petsc/petsc-3.12.5/include/petsc/private/kspimpl.h
[0]PETSC ERROR: #5 KSPFGMRESCycle() line 166 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
[0]PETSC ERROR: #6 KSPSolve_FGMRES() line 291 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
[0]PETSC ERROR: #7 KSPSolve() line 760 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /usr/local/petsc/petsc-3.12.5/src/snes/impls/ls/ls.c
[0]PETSC ERROR: #9 SNESSolve() line 4482 in /usr/local/petsc/petsc-3.12.5/src/snes/interface/snes.c
[0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783
SNES Object: 1 MPI processes
type: newtonls
maximum iterations=50, maximum function evaluations=10000
tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
total number of linear solver iterations=2
total number of function evaluations=1322
norm schedule ALWAYS
Jacobian is built using finite differences one column at a time
SNESLineSearch Object: 1 MPI processes
type: bt
interpolation: cubic
alpha=1.000000e-04
maxstep=1.000000e+08, minlambda=1.000000e-12
tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
maximum iterations=40
KSP Object: 1 MPI processes
type: fgmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
right preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: fieldsplit
FieldSplit with Schur preconditioner, blocksize = 1, factorization FULL
Preconditioner for the Schur complement formed from S itself
Split info:
Split number 0 Defined by IS
Split number 1 Defined by IS
KSP solver for A00 block
KSP Object: (fieldsplit_0_) 1 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_0_) 1 MPI processes
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=512
package used to perform factorization: petsc
total: nonzeros=9213, allocated nonzeros=9213
total number of mallocs used during MatSetValues calls=0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=512
total: nonzeros=9213, allocated nonzeros=9213
total number of mallocs used during MatSetValues calls=0
not using I-node routines
KSP solver for S = A11 - A10 inv(A00) A01
KSP Object: (fieldsplit_1_) 1 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_1_) 1 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: (fieldsplit_1_) 1 MPI processes
type: schurcomplement
rows=147, cols=147
Schur complement A11 - A10 inv(A00) A01
A11
Mat Object: 1 MPI processes
type: seqaij
rows=147, cols=147
total: nonzeros=147, allocated nonzeros=147
total number of mallocs used during MatSetValues calls=0
not using I-node routines
A10
Mat Object: 1 MPI processes
type: seqaij
rows=147, cols=512
total: nonzeros=2560, allocated nonzeros=2560
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 87 nodes, limit used is 5
KSP of A00
KSP Object: (fieldsplit_0_) 1 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: (fieldsplit_0_) 1 MPI processes
type: ilu
out-of-place factorization
0 levels of fill
tolerance for zero pivot 2.22045e-14
matrix ordering: natural
factor fill ratio given 1., needed 1.
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=512
package used to perform factorization: petsc
total: nonzeros=9213, allocated nonzeros=9213
total number of mallocs used during MatSetValues calls=0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=512
total: nonzeros=9213, allocated nonzeros=9213
total number of mallocs used during MatSetValues calls=0
not using I-node routines
A01
Mat Object: 1 MPI processes
type: seqaij
rows=512, cols=147
total: nonzeros=2562, allocated nonzeros=2562
total number of mallocs used during MatSetValues calls=0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=659, cols=659
total: nonzeros=14482, allocated nonzeros=27543
total number of mallocs used during MatSetValues calls=1309
not using I-node routines

On Oct 9 2020, at 2:17 am, Barry Smith <bsmith at petsc.dev> wrote:
>
> When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner.
>
> How are you doing the matrix vector product? Note both the operator and preconditioner must be linear operators for GMRES.
>
> FGMRES allows the preconditioner to be nonlinear. You can try
>
> -ksp_type fgmres -ksp_monitor_true_residual
>
> Barry
>
>
> > On Oct 8, 2020, at 2:43 AM, Yang Juntao <Y.Juntao at hotmail.com (https://link.getmailspring.com/link/AA5B8AA2-FFAB-403A-A231-10CE902ACB44 at getmailspring.com/0?redirect=mailto%3AY.Juntao%40hotmail.com&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D)> wrote:
> > Hello,
> >
> > I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver.
> > But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error.
> >
> > From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration.
> > Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem.
> >
> > iter = 0, SNES Function norm 2.94934e-06
> > iteration 0 KSP Residual norm 1.094600281831e-06
> > iteration 1 KSP Residual norm 1.264284474186e-08
> > iteration 2 KSP Residual norm 6.593269221816e-09
> > iteration 3 KSP Residual norm 1.689570779457e-09
> > iteration 4 KSP Residual norm 1.040661505932e-09
> > iteration 5 KSP Residual norm 5.422761817348e-10
> > iteration 6 KSP Residual norm 2.492867371369e-10
> > iteration 7 KSP Residual norm 8.261522376775e-11
> > iteration 8 KSP Residual norm 4.246401544245e-11
> > iteration 9 KSP Residual norm 2.514366787388e-11
> > iteration 10 KSP Residual norm 1.982940267051e-11
> > iteration 11 KSP Residual norm 1.586470414676e-11
> > iteration 12 KSP Residual norm 9.866392216207e-12
> > iteration 13 KSP Residual norm 4.951342176999e-12
> > iteration 14 KSP Residual norm 2.418292660318e-12
> > iteration 15 KSP Residual norm 1.747418526086e-12
> > iteration 16 KSP Residual norm 1.094150535809e-12
> > iteration 17 KSP Residual norm 4.464287492066e-13
> > iteration 18 KSP Residual norm 3.530090494462e-13
> > iteration 19 KSP Residual norm 2.825698091454e-13
> > iteration 20 KSP Residual norm 1.950568425807e-13
> > iteration 21 KSP Residual norm 1.227898091813e-13
> > iteration 22 KSP Residual norm 5.411106347374e-14
> > iteration 23 KSP Residual norm 4.511115848564e-14
> > iteration 24 KSP Residual norm 4.063546606691e-14
> > iteration 25 KSP Residual norm 3.677694771949e-14
> > iteration 26 KSP Residual norm 3.459244943466e-14
> > iteration 27 KSP Residual norm 3.263954971093e-14
> > iteration 28 KSP Residual norm 3.087344619079e-14
> > iteration 29 KSP Residual norm 2.809426925625e-14
> > iteration 30 KSP Residual norm 4.366149884754e-01
> > Linear solve did not converge due to DIVERGED_DTOL iterations 30
> >
> >
> >
> > SNES Object: 1 MPI processes
> > type: newtonls
> > SNES has not been set up so information may be incomplete
> > maximum iterations=50, maximum function evaluations=10000
> > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> > total number of linear solver iterations=0
> > total number of function evaluations=0
> > norm schedule ALWAYS
> > SNESLineSearch Object: 1 MPI processes
> > type: bt
> > interpolation: cubic
> > alpha=1.000000e-04
> > maxstep=1.000000e+08, minlambda=1.000000e-12
> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
> > maximum iterations=40
> > KSP Object: 1 MPI processes
> > type: gmres
> > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> > happy breakdown tolerance 1e-30
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
> > left preconditioning
> > using DEFAULT norm type for convergence test
> > PC Object: 1 MPI processes
> > type: fieldsplit
> > PC has not been set up so information may be incomplete
> > FieldSplit with Schur preconditioner, factorization FULL
> > Preconditioner for the Schur complement formed from S itself
> > Split info:
> > KSP solver for A00 block
> > not yet available
> > KSP solver for S = A11 - A10 inv(A00) A01
> > not yet available
> > linear system matrix = precond matrix:
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=659, cols=659
> > total: nonzeros=659, allocated nonzeros=7908
> > total number of mallocs used during MatSetValues calls=0
> > not using I-node routines
> >
> > Regards
> > Juntao
> >
> >
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201009/41958eb5/attachment-0001.html>

From bsmith at petsc.dev  Fri Oct  9 01:25:07 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 9 Oct 2020 01:25:07 -0500
Subject: [petsc-users] Convergence Error Debugging with KSP solvers in
 SNES
In-Reply-To: <TY2PR04MB3933E428FCF5221A2735B88D93080@TY2PR04MB3933.apcprd04.prod.outlook.com>
References: <17D7396D-BA2D-4E7A-859A-82F605E6D12D@petsc.dev>
	<TY2PR04MB3933E428FCF5221A2735B88D93080@TY2PR04MB3933.apcprd04.prod.outlook.com>
Message-ID: <9AA79324-0653-4088-8A1E-72FEE8CD1631@petsc.dev>


Before you do any investigation I would run with one SNES solve 

-snes_max_it 1 -snes_view 

It will print out exactly what solver configuration you are using.

------

[0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 
[0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct  9 13:43:28 2020
[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda
[0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c

  Ok, one of the inner solvers went crazy and diverged, that is the norm of the residual for that inner solver exploded. 

  Base on the line number 

[0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c 

  you can look at that file and see which of the inner solvers failed. From the info from -snes_view you will know what the KSP and PC is for that inner solve and the KSP options prefix. With that you can run the failing case with the addition option 

  -xxx_ksp_monitor_true_residual 

and watch the inner solve explode. 

This inner solver behaving badly can also explain the need for -ksp_type fgmres.  Normally PCFIELDSPLIT is a linear operator and so you can use -ksp_type gmres but there is some issue with the inner solver. Could it possible have a null space, that is be singular? Do you provide your own custom inner solver or just select from the options database? Does -pc_type lu make everything work fine?

  Barry


> On Oct 9, 2020, at 12:53 AM, Karl Yang <y.juntao at hotmail.com> wrote:
> 
> Hi, Barry,
> 
> Thanks for your reply. Yes, I should have used fgmres. But after switching to fgmres I'm still facing the same convergence issue.
> 
> Seems like the reason is due to DIVERGED_PC_FAILED. But I simply used FD jacobian, and fieldsplitPC.  I am a bit lost on whether I made some mistakes somewhere in the FormFunction or I did not setup the solver correctly.
> 
> ///////code///////
> SNESSetFunction(snes, r, FormFunctionStatic, this);
> // SNESSetJacobian(snes, J, J, FormJacobianStatic, this);
> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault, this);
> SNESMonitorSet(snes, MySNESMonitor, NULL, NULL);
> 
> SNESGetKSP(snes, &ksp);
> KSPGetPC(ksp, &pc);
> PCSetType(pc, PCFIELDSPLIT);
> PCFieldSplitSetDetectSaddlePoint(pc, PETSC_TRUE);
> PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELF, NULL);
> KSPMonitorSet(ksp, MyKSPMonitor, NULL, 0);
> KSPSetTolerances(ksp, 1e-8, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
> SNESSetFromOptions(snes);
> //////end/////////
> 
> Output from SNES/KSP solver
> ################# step 1 #################
> iter = 0, SNES Function norm 0.0430713
> iteration 0 KSP Residual norm 4.307133784528e-02
>     0 KSP unpreconditioned resid norm 4.307133784528e-02 true resid norm 4.307133784528e-02 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 4.451434065870e-07
>     1 KSP unpreconditioned resid norm 4.451434065870e-07 true resid norm 4.451434065902e-07 ||r(i)||/||b|| 1.033502623460e-05
> iteration 2 KSP Residual norm 1.079756105012e-12
>     2 KSP unpreconditioned resid norm 1.079756105012e-12 true resid norm 1.079754870815e-12 ||r(i)||/||b|| 2.506898844643e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 1, SNES Function norm 2.40846e-05
> iteration 0 KSP Residual norm 2.408462930023e-05
>     0 KSP unpreconditioned resid norm 2.408462930023e-05 true resid norm 2.408462930023e-05 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 1.096958085415e-11
>     1 KSP unpreconditioned resid norm 1.096958085415e-11 true resid norm 1.096958085425e-11 ||r(i)||/||b|| 4.554598170270e-07
> iteration 2 KSP Residual norm 5.909523288165e-16
>     2 KSP unpreconditioned resid norm 5.909523288165e-16 true resid norm 5.909519599233e-16 ||r(i)||/||b|| 2.453647729249e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 2, SNES Function norm 1.19684e-14
> ################# step 2 #################
> iter = 0, SNES Function norm 0.00391662
> iteration 0 KSP Residual norm 3.916615614134e-03
>     0 KSP unpreconditioned resid norm 3.916615614134e-03 true resid norm 3.916615614134e-03 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 4.068800385009e-08
>     1 KSP unpreconditioned resid norm 4.068800385009e-08 true resid norm 4.068800384986e-08 ||r(i)||/||b|| 1.038856192653e-05
> iteration 2 KSP Residual norm 8.427513055511e-14
>     2 KSP unpreconditioned resid norm 8.427513055511e-14 true resid norm 8.427497502034e-14 ||r(i)||/||b|| 2.151729537007e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 1, SNES Function norm 1.99152e-07
> iteration 0 KSP Residual norm 1.991523558528e-07
>     0 KSP unpreconditioned resid norm 1.991523558528e-07 true resid norm 1.991523558528e-07 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 1.413505562549e-13
>     1 KSP unpreconditioned resid norm 1.413505562549e-13 true resid norm 1.413505562550e-13 ||r(i)||/||b|| 7.097609046588e-07
> iteration 2 KSP Residual norm 5.165934822520e-18
>     2 KSP unpreconditioned resid norm 5.165934822520e-18 true resid norm 5.165932973227e-18 ||r(i)||/||b|| 2.593960262787e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 2, SNES Function norm 1.69561e-16
> ################# step 3 #################
> iter = 0, SNES Function norm 0.00035615
> iteration 0 KSP Residual norm 3.561504844171e-04
>     0 KSP unpreconditioned resid norm 3.561504844171e-04 true resid norm 3.561504844171e-04 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 3.701591890269e-09
>     1 KSP unpreconditioned resid norm 3.701591890269e-09 true resid norm 3.701591890274e-09 ||r(i)||/||b|| 1.039333667153e-05
> iteration 2 KSP Residual norm 7.832821034843e-15
>     2 KSP unpreconditioned resid norm 7.832821034843e-15 true resid norm 7.832856926692e-15 ||r(i)||/||b|| 2.199311041093e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 1, SNES Function norm 1.64671e-09
> iteration 0 KSP Residual norm 1.646709543241e-09
>     0 KSP unpreconditioned resid norm 1.646709543241e-09 true resid norm 1.646709543241e-09 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 1.043230469512e-15
>     1 KSP unpreconditioned resid norm 1.043230469512e-15 true resid norm 1.043230469512e-15 ||r(i)||/||b|| 6.335242749968e-07
> iteration 1 KSP Residual norm 0.000000000000e+00
>     1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm           -nan ||r(i)||/||b||           -nan
>   Linear solve did not converge due to DIVERGED_PC_FAILED iterations 1
>                  PC_FAILED due to SUBPC_ERROR
> 
> 
> More information from -ksp_error_if_not_converged -info
> 
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.303168180659e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 7.795816360977e-02 at iteration 12
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.227610512466e+00 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 5.453050347652e+05 at iteration 12
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] KSPConvergedDefault(): Linear solver is diverging. Initial right hand size norm 9.501675075823e-01, current residual norm 4.894880836662e+04 at iteration 210
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR:  
> [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020
> [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct  9 13:43:28 2020
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda
> [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c
> [0]PETSC ERROR: #3 PCApply() line 444 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #4 KSP_PCApply() line 281 in /usr/local/petsc/petsc-3.12.5/include/petsc/private/kspimpl.h
> [0]PETSC ERROR: #5 KSPFGMRESCycle() line 166 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
> [0]PETSC ERROR: #6 KSPSolve_FGMRES() line 291 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
> [0]PETSC ERROR: #7 KSPSolve() line 760 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /usr/local/petsc/petsc-3.12.5/src/snes/impls/ls/ls.c
> [0]PETSC ERROR: #9 SNESSolve() line 4482 in /usr/local/petsc/petsc-3.12.5/src/snes/interface/snes.c
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783
> SNES Object: 1 MPI processes
>   type: newtonls
>   maximum iterations=50, maximum function evaluations=10000
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=2
>   total number of function evaluations=1322
>   norm schedule ALWAYS
>   Jacobian is built using finite differences one column at a time
>   SNESLineSearch Object: 1 MPI processes
>     type: bt
>       interpolation: cubic
>       alpha=1.000000e-04
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
>     maximum iterations=40
>   KSP Object: 1 MPI processes
>     type: fgmres
>       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>     right preconditioning
>     using UNPRECONDITIONED norm type for convergence test
>   PC Object: 1 MPI processes
>     type: fieldsplit
>       FieldSplit with Schur preconditioner, blocksize = 1, factorization FULL
>       Preconditioner for the Schur complement formed from S itself
>       Split info:
>       Split number 0 Defined by IS
>       Split number 1 Defined by IS
>       KSP solver for A00 block
>         KSP Object: (fieldsplit_0_) 1 MPI processes
>           type: gmres
>             restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>             happy breakdown tolerance 1e-30
>           maximum iterations=10000, initial guess is zero
>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>           left preconditioning
>           using PRECONDITIONED norm type for convergence test
>         PC Object: (fieldsplit_0_) 1 MPI processes
>           type: ilu
>             out-of-place factorization
>             0 levels of fill
>             tolerance for zero pivot 2.22045e-14
>             matrix ordering: natural
>             factor fill ratio given 1., needed 1.
>               Factored matrix follows:
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=512, cols=512
>                   package used to perform factorization: petsc
>                   total: nonzeros=9213, allocated nonzeros=9213
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>           linear system matrix = precond matrix:
>           Mat Object: 1 MPI processes
>             type: seqaij
>             rows=512, cols=512
>             total: nonzeros=9213, allocated nonzeros=9213
>             total number of mallocs used during MatSetValues calls=0
>               not using I-node routines
>       KSP solver for S = A11 - A10 inv(A00) A01
>         KSP Object: (fieldsplit_1_) 1 MPI processes
>           type: gmres
>             restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>             happy breakdown tolerance 1e-30
>           maximum iterations=10000, initial guess is zero
>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>           left preconditioning
>           using PRECONDITIONED norm type for convergence test
>         PC Object: (fieldsplit_1_) 1 MPI processes
>           type: none
>           linear system matrix = precond matrix:
>           Mat Object: (fieldsplit_1_) 1 MPI processes
>             type: schurcomplement
>             rows=147, cols=147
>               Schur complement A11 - A10 inv(A00) A01
>               A11
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=147, cols=147
>                   total: nonzeros=147, allocated nonzeros=147
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>               A10
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=147, cols=512
>                   total: nonzeros=2560, allocated nonzeros=2560
>                   total number of mallocs used during MatSetValues calls=0
>                     using I-node routines: found 87 nodes, limit used is 5
>               KSP of A00
>                 KSP Object: (fieldsplit_0_) 1 MPI processes
>                   type: gmres
>                     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>                     happy breakdown tolerance 1e-30
>                   maximum iterations=10000, initial guess is zero
>                   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>                   left preconditioning
>                   using PRECONDITIONED norm type for convergence test
>                 PC Object: (fieldsplit_0_) 1 MPI processes
>                   type: ilu
>                     out-of-place factorization
>                     0 levels of fill
>                     tolerance for zero pivot 2.22045e-14
>                     matrix ordering: natural
>                     factor fill ratio given 1., needed 1.
>                       Factored matrix follows:
>                         Mat Object: 1 MPI processes
>                           type: seqaij
>                           rows=512, cols=512
>                           package used to perform factorization: petsc
>                           total: nonzeros=9213, allocated nonzeros=9213
>                           total number of mallocs used during MatSetValues calls=0
>                             not using I-node routines
>                   linear system matrix = precond matrix:
>                   Mat Object: 1 MPI processes
>                     type: seqaij
>                     rows=512, cols=512
>                     total: nonzeros=9213, allocated nonzeros=9213
>                     total number of mallocs used during MatSetValues calls=0
>                       not using I-node routines
>               A01
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=512, cols=147
>                   total: nonzeros=2562, allocated nonzeros=2562
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>     linear system matrix = precond matrix:
>     Mat Object: 1 MPI processes
>       type: seqaij
>       rows=659, cols=659
>       total: nonzeros=14482, allocated nonzeros=27543
>       total number of mallocs used during MatSetValues calls=1309
>         not using I-node routines
> 
> 
> 
> On Oct 9 2020, at 2:17 am, Barry Smith <bsmith at petsc.dev> wrote:
> 
>   When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner. 
> 
>   How are you doing the matrix vector product?   Note both the operator and preconditioner must be linear operators for GMRES.
> 
>   FGMRES allows the preconditioner to be nonlinear. You can try
> 
>   -ksp_type fgmres -ksp_monitor_true_residual
> 
>    Barry
> 
> 
> On Oct 8, 2020, at 2:43 AM, Yang Juntao <Y.Juntao at hotmail.com <https://link.getmailspring.com/link/AA5B8AA2-FFAB-403A-A231-10CE902ACB44 at getmailspring.com/0?redirect=mailto%3AY.Juntao%40hotmail.com&recipient=YnNtaXRoQHBldHNjLmRldg%3D%3D>> wrote:
> 
> Hello, 
>  
> I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver.
> But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error.
>  
> From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration.
> Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem.
>  
> iter = 0, SNES Function norm 2.94934e-06
> iteration 0 KSP Residual norm 1.094600281831e-06
> iteration 1 KSP Residual norm 1.264284474186e-08
> iteration 2 KSP Residual norm 6.593269221816e-09
> iteration 3 KSP Residual norm 1.689570779457e-09
> iteration 4 KSP Residual norm 1.040661505932e-09
> iteration 5 KSP Residual norm 5.422761817348e-10
> iteration 6 KSP Residual norm 2.492867371369e-10
> iteration 7 KSP Residual norm 8.261522376775e-11
> iteration 8 KSP Residual norm 4.246401544245e-11
> iteration 9 KSP Residual norm 2.514366787388e-11
> iteration 10 KSP Residual norm 1.982940267051e-11
> iteration 11 KSP Residual norm 1.586470414676e-11
> iteration 12 KSP Residual norm 9.866392216207e-12
> iteration 13 KSP Residual norm 4.951342176999e-12
> iteration 14 KSP Residual norm 2.418292660318e-12
> iteration 15 KSP Residual norm 1.747418526086e-12
> iteration 16 KSP Residual norm 1.094150535809e-12
> iteration 17 KSP Residual norm 4.464287492066e-13
> iteration 18 KSP Residual norm 3.530090494462e-13
> iteration 19 KSP Residual norm 2.825698091454e-13
> iteration 20 KSP Residual norm 1.950568425807e-13
> iteration 21 KSP Residual norm 1.227898091813e-13
> iteration 22 KSP Residual norm 5.411106347374e-14
> iteration 23 KSP Residual norm 4.511115848564e-14
> iteration 24 KSP Residual norm 4.063546606691e-14
> iteration 25 KSP Residual norm 3.677694771949e-14
> iteration 26 KSP Residual norm 3.459244943466e-14
> iteration 27 KSP Residual norm 3.263954971093e-14
> iteration 28 KSP Residual norm 3.087344619079e-14
> iteration 29 KSP Residual norm 2.809426925625e-14
> iteration 30 KSP Residual norm 4.366149884754e-01
>   Linear solve did not converge due to DIVERGED_DTOL iterations 30
>  
>  
> SNES Object: 1 MPI processes
>   type: newtonls
>   SNES has not been set up so information may be incomplete
>   maximum iterations=50, maximum function evaluations=10000
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=0
>   norm schedule ALWAYS
>   SNESLineSearch Object: 1 MPI processes
>     type: bt
>       interpolation: cubic
>       alpha=1.000000e-04
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
>     maximum iterations=40
>   KSP Object: 1 MPI processes
>     type: gmres
>       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using DEFAULT norm type for convergence test
>   PC Object: 1 MPI processes
>     type: fieldsplit
>     PC has not been set up so information may be incomplete
>       FieldSplit with Schur preconditioner, factorization FULL
>       Preconditioner for the Schur complement formed from S itself
>       Split info:
>       KSP solver for A00 block
>           not yet available
>       KSP solver for S = A11 - A10 inv(A00) A01
>           not yet available
>     linear system matrix = precond matrix:
>     Mat Object: 1 MPI processes
>       type: seqaij
>       rows=659, cols=659
>       total: nonzeros=659, allocated nonzeros=7908
>       total number of mallocs used during MatSetValues calls=0
>         not using I-node routines
>  
> Regards
> Juntao
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201009/0014adb4/attachment-0001.html>

From pranayreddy865 at gmail.com  Fri Oct  9 03:53:34 2020
From: pranayreddy865 at gmail.com (baikadi pranay)
Date: Fri, 9 Oct 2020 01:53:34 -0700
Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian
Message-ID: <CA+zFCT=Uji1Q7SmOvQ42eNXX0WvcJSKhxsCpYy8Y16yjdoXAtQ@mail.gmail.com>

Hello,
I have a couple of questions regarding how SNESSetFunction,SNESSetJacobian
and SNESSolve work together. I am trying to solve a nonlinear system of the
form A(x)x=b(x). I am using Fortran90. The way I intend to solve the above
equation is as follows:
Step 1: initialize x with an initial guess
Step 2: Solve using SNESSolve for (x^i, i is the iteration number,
i=1,2,3...)
Step 3: Calculate the update and check if it is less than tolerance
Step 4: If yes, end the loop. Else the jacobian matrix and function should
be updated using x^(i) and go back to step 2.

The part which is a little confusing to me is in understanding how to
update the jacobian matrix and the function F (= A(x)x-b(x)).

1) Should I explicitly call the subroutines Form Function and FormJacobian
by using x^i as the input argument or is this automatically taken care of
when I go back to step 2 and call SNESSolve?
2) If the answer to the above question is yes, I do not fully understand
the role played by the functions SNESSetFunction and SNESSetJacobian.

I apologize if I am not clear in my explanation. I would be glad to
elaborate on any section of my question. Please let me know if you need any
further information from my side.

Thank you,
Sincerely,
Pranay.
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201009/e7878825/attachment.html>

From knepley at gmail.com  Fri Oct  9 06:38:24 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 9 Oct 2020 07:38:24 -0400
Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian
In-Reply-To: <CA+zFCT=Uji1Q7SmOvQ42eNXX0WvcJSKhxsCpYy8Y16yjdoXAtQ@mail.gmail.com>
References: <CA+zFCT=Uji1Q7SmOvQ42eNXX0WvcJSKhxsCpYy8Y16yjdoXAtQ@mail.gmail.com>
Message-ID: <CAMYG4GmKRBZbFDE8H6qaiWXO6yWMGNAr9-NTHDmotEdjXmx=xQ@mail.gmail.com>

On Fri, Oct 9, 2020 at 4:53 AM baikadi pranay <pranayreddy865 at gmail.com>
wrote:

> Hello,
> I have a couple of questions regarding how SNESSetFunction,SNESSetJacobian
> and SNESSolve work together. I am trying to solve a nonlinear system of the
> form A(x)x=b(x). I am using Fortran90. The way I intend to solve the above
> equation is as follows:
> Step 1: initialize x with an initial guess
> Step 2: Solve using SNESSolve for (x^i, i is the iteration number,
> i=1,2,3...)
> Step 3: Calculate the update and check if it is less than tolerance
> Step 4: If yes, end the loop. Else the jacobian matrix and function should
> be updated using x^(i) and go back to step 2.
>

You are describing the Picard iteration:


https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html

You can do this, but it will converge more slowly than Newton. We usually
advise using Newton.


> The part which is a little confusing to me is in understanding how to
> update the jacobian matrix and the function F (= A(x)x-b(x)).
>
> 1) Should I explicitly call the subroutines Form Function and FormJacobian
> by using x^i as the input argument or is this automatically taken care of
> when I go back to step 2 and call SNESSolve?
>

No. SNES calls these automatically.

  Thanks,

     Matt


> 2) If the answer to the above question is yes, I do not fully understand
> the role played by the functions SNESSetFunction and SNESSetJacobian.
>
> I apologize if I am not clear in my explanation. I would be glad to
> elaborate on any section of my question. Please let me know if you need any
> further information from my side.
>
> Thank you,
> Sincerely,
> Pranay.
> ?
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201009/626e22e7/attachment.html>

From bsmith at petsc.dev  Fri Oct  9 09:43:46 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 9 Oct 2020 09:43:46 -0500
Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian
In-Reply-To: <CAMYG4GmKRBZbFDE8H6qaiWXO6yWMGNAr9-NTHDmotEdjXmx=xQ@mail.gmail.com>
References: <CA+zFCT=Uji1Q7SmOvQ42eNXX0WvcJSKhxsCpYy8Y16yjdoXAtQ@mail.gmail.com>
	<CAMYG4GmKRBZbFDE8H6qaiWXO6yWMGNAr9-NTHDmotEdjXmx=xQ@mail.gmail.com>
Message-ID: <F76559AB-003E-4AF2-8A71-88545022112E@petsc.dev>


  To provide the functions to the Picard iteration you call SNESSetPicard() not SNESSetFunction() and SNESSetJacobian(), you provide code to compute A(x) and b(x).

  Note that in the Picard iteration the matrix A(x) is NOT the Jacobian of  F(x) = A(x) x - b(x). The Jacobian of F(x) is the more complicated F'(x) = A(x) + A'(x)x + b'(x) 

  Barry


> On Oct 9, 2020, at 6:38 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Fri, Oct 9, 2020 at 4:53 AM baikadi pranay <pranayreddy865 at gmail.com <mailto:pranayreddy865 at gmail.com>> wrote:
> Hello,
> I have a couple of questions regarding how SNESSetFunction,SNESSetJacobian and SNESSolve work together. I am trying to solve a nonlinear system of the form A(x)x=b(x). I am using Fortran90. The way I intend to solve the above equation is as follows:
> Step 1: initialize x with an initial guess
> Step 2: Solve using SNESSolve for (x^i, i is the iteration number, i=1,2,3...)
> Step 3: Calculate the update and check if it is less than tolerance
> Step 4: If yes, end the loop. Else the jacobian matrix and function should be updated using x^(i) and go back to step 2.
> 
> You are describing the Picard iteration:
> 
>   https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html>
> 
> You can do this, but it will converge more slowly than Newton. We usually advise using Newton.
>  
> The part which is a little confusing to me is in understanding how to update the jacobian matrix and the function F (= A(x)x-b(x)). 
> 
> 1) Should I explicitly call the subroutines Form Function and FormJacobian by using x^i as the input argument or is this automatically taken care of when I go back to step 2 and call SNESSolve?
> 
> No. SNES calls these automatically.
> 
>   Thanks,
> 
>      Matt
>  
> 2) If the answer to the above question is yes, I do not fully understand the role played by the functions SNESSetFunction and SNESSetJacobian.
> 
> I apologize if I am not clear in my explanation. I would be glad to elaborate on any section of my question. Please let me know if you need any further information from my side.
> 
> Thank you,
> Sincerely,
> Pranay.
> ?
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201009/e93c5f79/attachment.html>

From bsmith at petsc.dev  Fri Oct  9 11:10:53 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 9 Oct 2020 11:10:53 -0500
Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian
In-Reply-To: <CA+zFCTnTVySK7a1Q0O+X4um+WkYRSTX5_H4uUn2hRTGdqZ2S5g@mail.gmail.com>
References: <CA+zFCT=Uji1Q7SmOvQ42eNXX0WvcJSKhxsCpYy8Y16yjdoXAtQ@mail.gmail.com>
	<CAMYG4GmKRBZbFDE8H6qaiWXO6yWMGNAr9-NTHDmotEdjXmx=xQ@mail.gmail.com>
	<F76559AB-003E-4AF2-8A71-88545022112E@petsc.dev>
	<CA+zFCTnTVySK7a1Q0O+X4um+WkYRSTX5_H4uUn2hRTGdqZ2S5g@mail.gmail.com>
Message-ID: <0BA3784A-6ECD-4FEB-8E64-5F05A89559CF@petsc.dev>


    I'm sorry, I made a small mistake in my previous email. It is 

   F'(x) = A(x) + A'(x)x - b'(x)   not F'(x) = A(x) + A'(x)x + b'(x) 


> On Oct 9, 2020, at 10:40 AM, baikadi pranay <pranayreddy865 at gmail.com> wrote:
> 
> Thank you for the response. I have one more quick question.
> Is the solution of A(x)x=b(x) obtained from Newton's method the final solution or is it the solution of A(x^i)x^(i+1)=b(x^i). In other words, do I need to use the solution obtained from Newton's method to update the Jacobian, use Newton method again and repeat the process?

  SNESSolve() with SNESSetPicard() continues the iteration calling your routine that computes A repeatedly until the system has converged. You can control the convergence criteria with SNESSetTolerances() (see also the manual pages that page links to). You never need call your routine that computes A from your code, PETSc calls it as it needs it. 

  Also, and I apologize for being pedantic, but using the computation of A() and SNESSetPicard() is NOT doing Newton's method, it is a different algorithm called Picard. If you want to run Newton then you need to write a routine that computes the quantity A(x) + A'(x)x + b'(x)  (not necessarily by using this exact product-rule formula). Computing just A() cannot give you Newton's method. For many problems Picard is good enough so people don't bother to code F'(x) and skip Newton's method and just use Picard. For some problems the extra effort of coding F'(x) gives a Newton that converges much faster than Picard.

  Barry


> Best Regards,
> Pranay. 
> ?
> 
> On Fri, Oct 9, 2020 at 7:43 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
>   To provide the functions to the Picard iteration you call SNESSetPicard() not SNESSetFunction() and SNESSetJacobian(), you provide code to compute A(x) and b(x).
> 
>   Note that in the Picard iteration the matrix A(x) is NOT the Jacobian of  F(x) = A(x) x - b(x). The Jacobian of F(x) is the more complicated F'(x) = A(x) + A'(x)x + b'(x) 
> 
>   Barry
> 
> 
>> On Oct 9, 2020, at 6:38 AM, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>> 
>> On Fri, Oct 9, 2020 at 4:53 AM baikadi pranay <pranayreddy865 at gmail.com <mailto:pranayreddy865 at gmail.com>> wrote:
>> Hello,
>> I have a couple of questions regarding how SNESSetFunction,SNESSetJacobian and SNESSolve work together. I am trying to solve a nonlinear system of the form A(x)x=b(x). I am using Fortran90. The way I intend to solve the above equation is as follows:
>> Step 1: initialize x with an initial guess
>> Step 2: Solve using SNESSolve for (x^i, i is the iteration number, i=1,2,3...)
>> Step 3: Calculate the update and check if it is less than tolerance
>> Step 4: If yes, end the loop. Else the jacobian matrix and function should be updated using x^(i) and go back to step 2.
>> 
>> You are describing the Picard iteration:
>> 
>>   https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html>
>> 
>> You can do this, but it will converge more slowly than Newton. We usually advise using Newton.
>>  
>> The part which is a little confusing to me is in understanding how to update the jacobian matrix and the function F (= A(x)x-b(x)). 
>> 
>> 1) Should I explicitly call the subroutines Form Function and FormJacobian by using x^i as the input argument or is this automatically taken care of when I go back to step 2 and call SNESSolve?
>> 
>> No. SNES calls these automatically.
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>> 2) If the answer to the above question is yes, I do not fully understand the role played by the functions SNESSetFunction and SNESSetJacobian.
>> 
>> I apologize if I am not clear in my explanation. I would be glad to elaborate on any section of my question. Please let me know if you need any further information from my side.
>> 
>> Thank you,
>> Sincerely,
>> Pranay.
>> ?
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201009/4dcd674c/attachment-0001.html>

From pranayreddy865 at gmail.com  Sat Oct 10 04:31:31 2020
From: pranayreddy865 at gmail.com (baikadi pranay)
Date: Sat, 10 Oct 2020 02:31:31 -0700
Subject: [petsc-users] MAT_COPY_VALUES not allowed for unassembled matrix
Message-ID: <CA+zFCTkYb0jE8OTb2y7jecedsEeUZUNWjAKFqWMGWe4+4nNmZw@mail.gmail.com>

Hello,
I am using the MatDuplicate routine so that I use the Jacobian matrix as a
preconditioning matrix as well. However, I get the error "MAT_COPY_VALUES
not allowed for unassembled matrix". The exact command I use is the
following:
*call MatDuplicate(jac,MAT_COPY_VALUES,prec,ierr)*
I am attaching you the error output in a text file for your reference.
Could you please let me know how to solve this problem?.
Thank you in advance.
Best Regards,
Pranay.

?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201010/72fe9887/attachment.html>
-------------- next part --------------
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: MAT_COPY_VALUES not allowed for unassembled matrix
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 
[0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-6.agave.rc.asu.edu by pbaikadi Sat Oct 10 02:25:11 2020
[0]PETSC ERROR: Configure options 
[0]PETSC ERROR: #1 MatDuplicate() line 4606 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/interface/matrix.c
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Not for unassembled matrix
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 
[0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-6.agave.rc.asu.edu by pbaikadi Sat Oct 10 02:25:11 2020
[0]PETSC ERROR: Configure options 
[0]PETSC ERROR: #2 MatGetOrdering() line 180 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/order/sorder.c
[0]PETSC ERROR: #3 PCSetUp_ILU() line 134 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/impls/factor/ilu/ilu.c
[0]PETSC ERROR: #4 PCSetUp() line 932 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #5 KSPSetUp() line 391 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #6 KSPSolve() line 725 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 225 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/impls/ls/ls.c
[0]PETSC ERROR: #8 SNESSolve() line 4560 in /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/interface/snes.c

From knepley at gmail.com  Sat Oct 10 12:25:29 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 10 Oct 2020 13:25:29 -0400
Subject: [petsc-users] MAT_COPY_VALUES not allowed for unassembled matrix
In-Reply-To: <CA+zFCTkYb0jE8OTb2y7jecedsEeUZUNWjAKFqWMGWe4+4nNmZw@mail.gmail.com>
References: <CA+zFCTkYb0jE8OTb2y7jecedsEeUZUNWjAKFqWMGWe4+4nNmZw@mail.gmail.com>
Message-ID: <CAMYG4GmBs0tr4PhJqAOhir25Gr+WdZzXNcc8FcCF8T6YpfStzw@mail.gmail.com>

On Sat, Oct 10, 2020 at 5:31 AM baikadi pranay <pranayreddy865 at gmail.com>
wrote:

> Hello,
> I am using the MatDuplicate routine so that I use the Jacobian matrix as a
> preconditioning matrix as well. However, I get the error "MAT_COPY_VALUES
> not allowed for unassembled matrix". The exact command I use is the
> following:
> *call MatDuplicate(jac,MAT_COPY_VALUES,prec,ierr)*
> I am attaching you the error output in a text file for your reference.
> Could you please let me know how to solve this problem?.
>

1) You should be using CHKERRQ(ierr) after the call

2) You need to assemble the matrix, MatAssemblyBegin/End(), before calling
MatDuplicate()

3) You do not need to duplicate the matrix, just pass the same matrix twice

  Thanks,

     Matt


> Thank you in advance.
> Best Regards,
> Pranay.
>
> ?
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201010/d7d1152d/attachment.html>

From jed at jedbrown.org  Sat Oct 10 14:45:16 2020
From: jed at jedbrown.org (Jed Brown)
Date: Sat, 10 Oct 2020 13:45:16 -0600
Subject: [petsc-users] Regarding SNESSetFunction and SNESSetJacobian
In-Reply-To: <0BA3784A-6ECD-4FEB-8E64-5F05A89559CF@petsc.dev>
References: <CA+zFCT=Uji1Q7SmOvQ42eNXX0WvcJSKhxsCpYy8Y16yjdoXAtQ@mail.gmail.com>
	<CAMYG4GmKRBZbFDE8H6qaiWXO6yWMGNAr9-NTHDmotEdjXmx=xQ@mail.gmail.com>
	<F76559AB-003E-4AF2-8A71-88545022112E@petsc.dev>
	<CA+zFCTnTVySK7a1Q0O+X4um+WkYRSTX5_H4uUn2hRTGdqZ2S5g@mail.gmail.com>
	<0BA3784A-6ECD-4FEB-8E64-5F05A89559CF@petsc.dev>
Message-ID: <87ft6l50hf.fsf@jedbrown.org>

Barry Smith <bsmith at petsc.dev> writes:

>     I'm sorry, I made a small mistake in my previous email. It is 
>
>    F'(x) = A(x) + A'(x)x - b'(x)   not F'(x) = A(x) + A'(x)x + b'(x) 

I find this much easier to write in variational notation:

F(x) = A(x) x - b(x)

F'(x) dx = A(x) dx + (A'(x) dx) x - b'(x) dx


Note that A'(x) is a third order tensor so A'(x) dx is a second order tensor (i.e., a matrix).  As such, one never wants to represent A'(x) on its own, or even A'(x) dx for that matter.  This is one reason I dislike this notation.  For any given example, it's often possible to write the operator dx \mapsto (A'(x) dx) x in an intuitive way, but it can take thought and this tends to be more circuitous than working with F(x) directly.

From y.juntao at hotmail.com  Sun Oct 11 01:10:37 2020
From: y.juntao at hotmail.com (Karl Yang)
Date: Sun, 11 Oct 2020 14:10:37 +0800
Subject: [petsc-users] Convergence Error Debugging with KSP solvers in
 SNES
In-Reply-To: <9AA79324-0653-4088-8A1E-72FEE8CD1631@petsc.dev>
References: <9AA79324-0653-4088-8A1E-72FEE8CD1631@petsc.dev>
Message-ID: <TY2PR04MB3933AF9CC0CE54743EA7E43593060@TY2PR04MB3933.apcprd04.prod.outlook.com>

Hi, Barray,

Thank you for helping. I've identified the divergence took place at fieldsplit_1_inner.
It is singular for "pressure field" because I used periodic boundary condition. But doesn't ksp solver by default took care of constant null space?

And if it is necessary, could you give me some help on what's the best location to call MatNullSpaceRemove as the ksp solver is hiding after SNES solver and FIELDSPLIT_PC.
Regards
Juntao

On Oct 9 2020, at 2:25 pm, Barry Smith <bsmith at petsc.dev> wrote:
>
> Before you do any investigation I would run with one SNES solve
>
> -snes_max_it 1 -snes_view
>
> It will print out exactly what solver configuration you are using.
>
> ------
>
> [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html (https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/0?redirect=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fdocumentation%2Ffaq.html&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D) for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020
> [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda
> [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c
>
> Ok, one of the inner solvers went crazy and diverged, that is the norm of the residual for that inner solver exploded.
>
> Base on the line number
>
> [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c
>
> you can look at that file and see which of the inner solvers failed. From the info from -snes_view you will know what the KSP and PC is for that inner solve and the KSP options prefix. With that you can run the failing case with the addition option
>
> -xxx_ksp_monitor_true_residual
>
> and watch the inner solve explode.
>
> This inner solver behaving badly can also explain the need for -ksp_type fgmres. Normally PCFIELDSPLIT is a linear operator and so you can use -ksp_type gmres but there is some issue with the inner solver. Could it possible have a null space, that is be singular? Do you provide your own custom inner solver or just select from the options database? Does -pc_type lu make everything work fine?
>
> Barry
>
>
>
>
>
> > On Oct 9, 2020, at 12:53 AM, Karl Yang <y.juntao at hotmail.com (https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/1?redirect=mailto%3Ay.juntao%40hotmail.com&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D)> wrote:
> > Hi, Barry,
> > Thanks for your reply. Yes, I should have used fgmres. But after switching to fgmres I'm still facing the same convergence issue.
> > Seems like the reason is due to DIVERGED_PC_FAILED. But I simply used FD jacobian, and fieldsplitPC. I am a bit lost on whether I made some mistakes somewhere in the FormFunction or I did not setup the solver correctly.
> > ///////code///////
> > SNESSetFunction(snes, r, FormFunctionStatic, this);
> > // SNESSetJacobian(snes, J, J, FormJacobianStatic, this);
> > SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault, this);
> > SNESMonitorSet(snes, MySNESMonitor, NULL, NULL);
> >
> > SNESGetKSP(snes, &ksp);
> > KSPGetPC(ksp, &pc);
> > PCSetType(pc, PCFIELDSPLIT);
> > PCFieldSplitSetDetectSaddlePoint(pc, PETSC_TRUE);
> > PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELF, NULL);
> > KSPMonitorSet(ksp, MyKSPMonitor, NULL, 0);
> > KSPSetTolerances(ksp, 1e-8, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
> > SNESSetFromOptions(snes);
> > //////end/////////
> >
> > Output from SNES/KSP solver
> > ################# step 1 #################
> > iter = 0, SNES Function norm 0.0430713
> > iteration 0 KSP Residual norm 4.307133784528e-02
> > 0 KSP unpreconditioned resid norm 4.307133784528e-02 true resid norm 4.307133784528e-02 ||r(i)||/||b|| 1.000000000000e+00
> > iteration 1 KSP Residual norm 4.451434065870e-07
> > 1 KSP unpreconditioned resid norm 4.451434065870e-07 true resid norm 4.451434065902e-07 ||r(i)||/||b|| 1.033502623460e-05
> > iteration 2 KSP Residual norm 1.079756105012e-12
> > 2 KSP unpreconditioned resid norm 1.079756105012e-12 true resid norm 1.079754870815e-12 ||r(i)||/||b|| 2.506898844643e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 2
> > iter = 1, SNES Function norm 2.40846e-05
> > iteration 0 KSP Residual norm 2.408462930023e-05
> > 0 KSP unpreconditioned resid norm 2.408462930023e-05 true resid norm 2.408462930023e-05 ||r(i)||/||b|| 1.000000000000e+00
> > iteration 1 KSP Residual norm 1.096958085415e-11
> > 1 KSP unpreconditioned resid norm 1.096958085415e-11 true resid norm 1.096958085425e-11 ||r(i)||/||b|| 4.554598170270e-07
> > iteration 2 KSP Residual norm 5.909523288165e-16
> > 2 KSP unpreconditioned resid norm 5.909523288165e-16 true resid norm 5.909519599233e-16 ||r(i)||/||b|| 2.453647729249e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 2
> > iter = 2, SNES Function norm 1.19684e-14
> > ################# step 2 #################
> > iter = 0, SNES Function norm 0.00391662
> > iteration 0 KSP Residual norm 3.916615614134e-03
> > 0 KSP unpreconditioned resid norm 3.916615614134e-03 true resid norm 3.916615614134e-03 ||r(i)||/||b|| 1.000000000000e+00
> > iteration 1 KSP Residual norm 4.068800385009e-08
> > 1 KSP unpreconditioned resid norm 4.068800385009e-08 true resid norm 4.068800384986e-08 ||r(i)||/||b|| 1.038856192653e-05
> > iteration 2 KSP Residual norm 8.427513055511e-14
> > 2 KSP unpreconditioned resid norm 8.427513055511e-14 true resid norm 8.427497502034e-14 ||r(i)||/||b|| 2.151729537007e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 2
> > iter = 1, SNES Function norm 1.99152e-07
> > iteration 0 KSP Residual norm 1.991523558528e-07
> > 0 KSP unpreconditioned resid norm 1.991523558528e-07 true resid norm 1.991523558528e-07 ||r(i)||/||b|| 1.000000000000e+00
> > iteration 1 KSP Residual norm 1.413505562549e-13
> > 1 KSP unpreconditioned resid norm 1.413505562549e-13 true resid norm 1.413505562550e-13 ||r(i)||/||b|| 7.097609046588e-07
> > iteration 2 KSP Residual norm 5.165934822520e-18
> > 2 KSP unpreconditioned resid norm 5.165934822520e-18 true resid norm 5.165932973227e-18 ||r(i)||/||b|| 2.593960262787e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 2
> > iter = 2, SNES Function norm 1.69561e-16
> > ################# step 3 #################
> > iter = 0, SNES Function norm 0.00035615
> > iteration 0 KSP Residual norm 3.561504844171e-04
> > 0 KSP unpreconditioned resid norm 3.561504844171e-04 true resid norm 3.561504844171e-04 ||r(i)||/||b|| 1.000000000000e+00
> > iteration 1 KSP Residual norm 3.701591890269e-09
> > 1 KSP unpreconditioned resid norm 3.701591890269e-09 true resid norm 3.701591890274e-09 ||r(i)||/||b|| 1.039333667153e-05
> > iteration 2 KSP Residual norm 7.832821034843e-15
> > 2 KSP unpreconditioned resid norm 7.832821034843e-15 true resid norm 7.832856926692e-15 ||r(i)||/||b|| 2.199311041093e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 2
> > iter = 1, SNES Function norm 1.64671e-09
> > iteration 0 KSP Residual norm 1.646709543241e-09
> > 0 KSP unpreconditioned resid norm 1.646709543241e-09 true resid norm 1.646709543241e-09 ||r(i)||/||b|| 1.000000000000e+00
> > iteration 1 KSP Residual norm 1.043230469512e-15
> > 1 KSP unpreconditioned resid norm 1.043230469512e-15 true resid norm 1.043230469512e-15 ||r(i)||/||b|| 6.335242749968e-07
> > iteration 1 KSP Residual norm 0.000000000000e+00
> > 1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan
> > Linear solve did not converge due to DIVERGED_PC_FAILED iterations 1
> > PC_FAILED due to SUBPC_ERROR
> >
> >
> > More information from -ksp_error_if_not_converged -info
> > [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.303168180659e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 7.795816360977e-02 at iteration 12
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.227610512466e+00 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 5.453050347652e+05 at iteration 12
> > [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> > [0] KSPConvergedDefault(): Linear solver is diverging. Initial right hand size norm 9.501675075823e-01, current residual norm 4.894880836662e+04 at iteration 210
> > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > [0]PETSC ERROR:
> > [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html (https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/2?redirect=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fdocumentation%2Ffaq.html&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D) for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020
> > [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct 9 13:43:28 2020
> > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda
> > [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c
> > [0]PETSC ERROR: #3 PCApply() line 444 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/interface/precon.c
> > [0]PETSC ERROR: #4 KSP_PCApply() line 281 in /usr/local/petsc/petsc-3.12.5/include/petsc/private/kspimpl.h
> > [0]PETSC ERROR: #5 KSPFGMRESCycle() line 166 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
> > [0]PETSC ERROR: #6 KSPSolve_FGMRES() line 291 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
> > [0]PETSC ERROR: #7 KSPSolve() line 760 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /usr/local/petsc/petsc-3.12.5/src/snes/impls/ls/ls.c
> > [0]PETSC ERROR: #9 SNESSolve() line 4482 in /usr/local/petsc/petsc-3.12.5/src/snes/interface/snes.c
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783
> > SNES Object: 1 MPI processes
> > type: newtonls
> > maximum iterations=50, maximum function evaluations=10000
> > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> > total number of linear solver iterations=2
> > total number of function evaluations=1322
> > norm schedule ALWAYS
> > Jacobian is built using finite differences one column at a time
> > SNESLineSearch Object: 1 MPI processes
> > type: bt
> > interpolation: cubic
> > alpha=1.000000e-04
> > maxstep=1.000000e+08, minlambda=1.000000e-12
> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
> > maximum iterations=40
> > KSP Object: 1 MPI processes
> > type: fgmres
> > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> > happy breakdown tolerance 1e-30
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
> > right preconditioning
> > using UNPRECONDITIONED norm type for convergence test
> > PC Object: 1 MPI processes
> > type: fieldsplit
> > FieldSplit with Schur preconditioner, blocksize = 1, factorization FULL
> > Preconditioner for the Schur complement formed from S itself
> > Split info:
> > Split number 0 Defined by IS
> > Split number 1 Defined by IS
> > KSP solver for A00 block
> > KSP Object: (fieldsplit_0_) 1 MPI processes
> > type: gmres
> > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> > happy breakdown tolerance 1e-30
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > left preconditioning
> > using PRECONDITIONED norm type for convergence test
> > PC Object: (fieldsplit_0_) 1 MPI processes
> > type: ilu
> > out-of-place factorization
> > 0 levels of fill
> > tolerance for zero pivot 2.22045e-14
> > matrix ordering: natural
> > factor fill ratio given 1., needed 1.
> > Factored matrix follows:
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=512, cols=512
> > package used to perform factorization: petsc
> > total: nonzeros=9213, allocated nonzeros=9213
> > total number of mallocs used during MatSetValues calls=0
> > not using I-node routines
> > linear system matrix = precond matrix:
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=512, cols=512
> > total: nonzeros=9213, allocated nonzeros=9213
> > total number of mallocs used during MatSetValues calls=0
> > not using I-node routines
> > KSP solver for S = A11 - A10 inv(A00) A01
> > KSP Object: (fieldsplit_1_) 1 MPI processes
> > type: gmres
> > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> > happy breakdown tolerance 1e-30
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > left preconditioning
> > using PRECONDITIONED norm type for convergence test
> > PC Object: (fieldsplit_1_) 1 MPI processes
> > type: none
> > linear system matrix = precond matrix:
> > Mat Object: (fieldsplit_1_) 1 MPI processes
> > type: schurcomplement
> > rows=147, cols=147
> > Schur complement A11 - A10 inv(A00) A01
> > A11
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=147, cols=147
> > total: nonzeros=147, allocated nonzeros=147
> > total number of mallocs used during MatSetValues calls=0
> > not using I-node routines
> > A10
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=147, cols=512
> > total: nonzeros=2560, allocated nonzeros=2560
> > total number of mallocs used during MatSetValues calls=0
> > using I-node routines: found 87 nodes, limit used is 5
> > KSP of A00
> > KSP Object: (fieldsplit_0_) 1 MPI processes
> > type: gmres
> > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> > happy breakdown tolerance 1e-30
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> > left preconditioning
> > using PRECONDITIONED norm type for convergence test
> > PC Object: (fieldsplit_0_) 1 MPI processes
> > type: ilu
> > out-of-place factorization
> > 0 levels of fill
> > tolerance for zero pivot 2.22045e-14
> > matrix ordering: natural
> > factor fill ratio given 1., needed 1.
> > Factored matrix follows:
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=512, cols=512
> > package used to perform factorization: petsc
> > total: nonzeros=9213, allocated nonzeros=9213
> > total number of mallocs used during MatSetValues calls=0
> > not using I-node routines
> > linear system matrix = precond matrix:
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=512, cols=512
> > total: nonzeros=9213, allocated nonzeros=9213
> > total number of mallocs used during MatSetValues calls=0
> > not using I-node routines
> > A01
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=512, cols=147
> > total: nonzeros=2562, allocated nonzeros=2562
> > total number of mallocs used during MatSetValues calls=0
> > not using I-node routines
> > linear system matrix = precond matrix:
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=659, cols=659
> > total: nonzeros=14482, allocated nonzeros=27543
> > total number of mallocs used during MatSetValues calls=1309
> > not using I-node routines
> >
> >
> >
> > On Oct 9 2020, at 2:17 am, Barry Smith <bsmith at petsc.dev (https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/3?redirect=mailto%3Absmith%40petsc.dev&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D)> wrote:
> > >
> > > When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner.
> > >
> > > How are you doing the matrix vector product? Note both the operator and preconditioner must be linear operators for GMRES.
> > >
> > > FGMRES allows the preconditioner to be nonlinear. You can try
> > >
> > > -ksp_type fgmres -ksp_monitor_true_residual
> > >
> > > Barry
> > >
> > >
> > > > On Oct 8, 2020, at 2:43 AM, Yang Juntao <Y.Juntao at hotmail.com (https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/4?redirect=https%3A%2F%2Flink.getmailspring.com%2Flink%2FAA5B8AA2-FFAB-403A-A231-10CE902ACB44%40getmailspring.com%2F0%3Fredirect%3Dmailto%253AY.Juntao%2540hotmail.com%26recipient%3DYnNtaXRoQHBldHNjLmRldg%253D%253D&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D)> wrote:
> > > > Hello,
> > > >
> > > > I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver.
> > > > But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error.
> > > >
> > > > From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration.
> > > > Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem.
> > > >
> > > > iter = 0, SNES Function norm 2.94934e-06
> > > > iteration 0 KSP Residual norm 1.094600281831e-06
> > > > iteration 1 KSP Residual norm 1.264284474186e-08
> > > > iteration 2 KSP Residual norm 6.593269221816e-09
> > > > iteration 3 KSP Residual norm 1.689570779457e-09
> > > > iteration 4 KSP Residual norm 1.040661505932e-09
> > > > iteration 5 KSP Residual norm 5.422761817348e-10
> > > > iteration 6 KSP Residual norm 2.492867371369e-10
> > > > iteration 7 KSP Residual norm 8.261522376775e-11
> > > > iteration 8 KSP Residual norm 4.246401544245e-11
> > > > iteration 9 KSP Residual norm 2.514366787388e-11
> > > > iteration 10 KSP Residual norm 1.982940267051e-11
> > > > iteration 11 KSP Residual norm 1.586470414676e-11
> > > > iteration 12 KSP Residual norm 9.866392216207e-12
> > > > iteration 13 KSP Residual norm 4.951342176999e-12
> > > > iteration 14 KSP Residual norm 2.418292660318e-12
> > > > iteration 15 KSP Residual norm 1.747418526086e-12
> > > > iteration 16 KSP Residual norm 1.094150535809e-12
> > > > iteration 17 KSP Residual norm 4.464287492066e-13
> > > > iteration 18 KSP Residual norm 3.530090494462e-13
> > > > iteration 19 KSP Residual norm 2.825698091454e-13
> > > > iteration 20 KSP Residual norm 1.950568425807e-13
> > > > iteration 21 KSP Residual norm 1.227898091813e-13
> > > > iteration 22 KSP Residual norm 5.411106347374e-14
> > > > iteration 23 KSP Residual norm 4.511115848564e-14
> > > > iteration 24 KSP Residual norm 4.063546606691e-14
> > > > iteration 25 KSP Residual norm 3.677694771949e-14
> > > > iteration 26 KSP Residual norm 3.459244943466e-14
> > > > iteration 27 KSP Residual norm 3.263954971093e-14
> > > > iteration 28 KSP Residual norm 3.087344619079e-14
> > > > iteration 29 KSP Residual norm 2.809426925625e-14
> > > > iteration 30 KSP Residual norm 4.366149884754e-01
> > > > Linear solve did not converge due to DIVERGED_DTOL iterations 30
> > > >
> > > >
> > > >
> > > > SNES Object: 1 MPI processes
> > > > type: newtonls
> > > > SNES has not been set up so information may be incomplete
> > > > maximum iterations=50, maximum function evaluations=10000
> > > > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> > > > total number of linear solver iterations=0
> > > > total number of function evaluations=0
> > > > norm schedule ALWAYS
> > > > SNESLineSearch Object: 1 MPI processes
> > > > type: bt
> > > > interpolation: cubic
> > > > alpha=1.000000e-04
> > > > maxstep=1.000000e+08, minlambda=1.000000e-12
> > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
> > > > maximum iterations=40
> > > > KSP Object: 1 MPI processes
> > > > type: gmres
> > > > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> > > > happy breakdown tolerance 1e-30
> > > > maximum iterations=10000, initial guess is zero
> > > > tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
> > > > left preconditioning
> > > > using DEFAULT norm type for convergence test
> > > > PC Object: 1 MPI processes
> > > > type: fieldsplit
> > > > PC has not been set up so information may be incomplete
> > > > FieldSplit with Schur preconditioner, factorization FULL
> > > > Preconditioner for the Schur complement formed from S itself
> > > > Split info:
> > > > KSP solver for A00 block
> > > > not yet available
> > > > KSP solver for S = A11 - A10 inv(A00) A01
> > > > not yet available
> > > > linear system matrix = precond matrix:
> > > > Mat Object: 1 MPI processes
> > > > type: seqaij
> > > > rows=659, cols=659
> > > > total: nonzeros=659, allocated nonzeros=7908
> > > > total number of mallocs used during MatSetValues calls=0
> > > > not using I-node routines
> > > >
> > > > Regards
> > > > Juntao
> > > >
> > > >
> > >
> > >
> > >
> >
> >
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201011/8c600e01/attachment-0001.html>

From bsmith at petsc.dev  Sun Oct 11 01:53:50 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 11 Oct 2020 01:53:50 -0500
Subject: [petsc-users] Convergence Error Debugging with KSP solvers in
 SNES
In-Reply-To: <TY2PR04MB3933440AAB90C71DFDF02A5793060@TY2PR04MB3933.apcprd04.prod.outlook.com>
References: <9AA79324-0653-4088-8A1E-72FEE8CD1631@petsc.dev>
	<TY2PR04MB3933440AAB90C71DFDF02A5793060@TY2PR04MB3933.apcprd04.prod.outlook.com>
Message-ID: <76AC38BE-D6D9-415D-8A02-68466455F874@petsc.dev>


 Juntao,

   Are you providing the null space to the pressure field solver? 

  Since it is only the constants it is possible to provide the null space from the command line using 

  -prefix_ksp_constant_null_space

  where prefix is the prefix from the particular sub KSP you wish to solve. This is printed next to each inner KSP in the output from -ksp_view.

  In your case I am guessing you need

  -fieldsplit_1_inner_ksp_constant_null_space

  If you ran before with -fieldsplit_1_inner_ksp_monitor  you can still use this option and hopefully will not see the pressure solve explode by adding the constant command line option. It is also possible to attach the null space within the program but that is cumbersome so best to get everything working and worry about that superficial change later.

  Barry

BTW: I have a new git branch barry/2020-10-09/all-ksp-monitor that I wrote to help make it easier to understand the convergence of PCFIELDSPLIT. You just use the option -all_ksp_monitor and it will print the convergence history for ALL the KSP solves, inner and outer. So one can easily see which inner ones are working well and which are not.


> On Oct 11, 2020, at 1:10 AM, Karl Yang <y.juntao at hotmail.com> wrote:
> 
> Hi, Barray,
> 
> Thank you for helping. I've identified the divergence took place at fieldsplit_1_inner.
> It is singular for "pressure field" because I used periodic boundary condition. But doesn't ksp solver by default took care of constant null space?
> 
> And if it is necessary, could you give me some help on what's the best location to call MatNullSpaceRemove as the ksp solver is hiding after SNES solver and FIELDSPLIT_PC.
> 
> Regards
> Juntao
> 
> 
> On Oct 9 2020, at 2:25 pm, Barry Smith <bsmith at petsc.dev> wrote:
> 
> Before you do any investigation I would run with one SNES solve 
> 
> -snes_max_it 1 -snes_view 
> 
> It will print out exactly what solver configuration you are using.
> 
> ------
> 
> [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/0?redirect=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fdocumentation%2Ffaq.html&recipient=YnNtaXRoQHBldHNjLmRldg%3D%3D> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020 
> [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct  9 13:43:28 2020
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda
> [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c
> 
>   Ok, one of the inner solvers went crazy and diverged, that is the norm of the residual for that inner solver exploded. 
> 
>   Base on the line number 
> 
> [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c 
> 
>   you can look at that file and see which of the inner solvers failed. From the info from -snes_view you will know what the KSP and PC is for that inner solve and the KSP options prefix. With that you can run the failing case with the addition option 
> 
>   -xxx_ksp_monitor_true_residual 
> 
> and watch the inner solve explode. 
> 
> This inner solver behaving badly can also explain the need for -ksp_type fgmres.  Normally PCFIELDSPLIT is a linear operator and so you can use -ksp_type gmres but there is some issue with the inner solver. Could it possible have a null space, that is be singular? Do you provide your own custom inner solver or just select from the options database? Does -pc_type lu make everything work fine?
> 
>   Barry
> 
> 
> 
> 
>   
> 
> On Oct 9, 2020, at 12:53 AM, Karl Yang <y.juntao at hotmail.com <https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/1?redirect=mailto%3Ay.juntao%40hotmail.com&recipient=YnNtaXRoQHBldHNjLmRldg%3D%3D>> wrote:
> 
> Hi, Barry,
> 
> Thanks for your reply. Yes, I should have used fgmres. But after switching to fgmres I'm still facing the same convergence issue.
> 
> Seems like the reason is due to DIVERGED_PC_FAILED. But I simply used FD jacobian, and fieldsplitPC.  I am a bit lost on whether I made some mistakes somewhere in the FormFunction or I did not setup the solver correctly.
> 
> ///////code///////
> SNESSetFunction(snes, r, FormFunctionStatic, this);
> // SNESSetJacobian(snes, J, J, FormJacobianStatic, this);
> SNESSetJacobian(snes, J, J, SNESComputeJacobianDefault, this);
> SNESMonitorSet(snes, MySNESMonitor, NULL, NULL);
> 
> SNESGetKSP(snes, &ksp);
> KSPGetPC(ksp, &pc);
> PCSetType(pc, PCFIELDSPLIT);
> PCFieldSplitSetDetectSaddlePoint(pc, PETSC_TRUE);
> PCFieldSplitSetSchurPre(pc, PC_FIELDSPLIT_SCHUR_PRE_SELF, NULL);
> KSPMonitorSet(ksp, MyKSPMonitor, NULL, 0);
> KSPSetTolerances(ksp, 1e-8, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);
> SNESSetFromOptions(snes);
> //////end/////////
> 
> Output from SNES/KSP solver
> ################# step 1 #################
> iter = 0, SNES Function norm 0.0430713
> iteration 0 KSP Residual norm 4.307133784528e-02
>     0 KSP unpreconditioned resid norm 4.307133784528e-02 true resid norm 4.307133784528e-02 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 4.451434065870e-07
>     1 KSP unpreconditioned resid norm 4.451434065870e-07 true resid norm 4.451434065902e-07 ||r(i)||/||b|| 1.033502623460e-05
> iteration 2 KSP Residual norm 1.079756105012e-12
>     2 KSP unpreconditioned resid norm 1.079756105012e-12 true resid norm 1.079754870815e-12 ||r(i)||/||b|| 2.506898844643e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 1, SNES Function norm 2.40846e-05
> iteration 0 KSP Residual norm 2.408462930023e-05
>     0 KSP unpreconditioned resid norm 2.408462930023e-05 true resid norm 2.408462930023e-05 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 1.096958085415e-11
>     1 KSP unpreconditioned resid norm 1.096958085415e-11 true resid norm 1.096958085425e-11 ||r(i)||/||b|| 4.554598170270e-07
> iteration 2 KSP Residual norm 5.909523288165e-16
>     2 KSP unpreconditioned resid norm 5.909523288165e-16 true resid norm 5.909519599233e-16 ||r(i)||/||b|| 2.453647729249e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 2, SNES Function norm 1.19684e-14
> ################# step 2 #################
> iter = 0, SNES Function norm 0.00391662
> iteration 0 KSP Residual norm 3.916615614134e-03
>     0 KSP unpreconditioned resid norm 3.916615614134e-03 true resid norm 3.916615614134e-03 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 4.068800385009e-08
>     1 KSP unpreconditioned resid norm 4.068800385009e-08 true resid norm 4.068800384986e-08 ||r(i)||/||b|| 1.038856192653e-05
> iteration 2 KSP Residual norm 8.427513055511e-14
>     2 KSP unpreconditioned resid norm 8.427513055511e-14 true resid norm 8.427497502034e-14 ||r(i)||/||b|| 2.151729537007e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 1, SNES Function norm 1.99152e-07
> iteration 0 KSP Residual norm 1.991523558528e-07
>     0 KSP unpreconditioned resid norm 1.991523558528e-07 true resid norm 1.991523558528e-07 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 1.413505562549e-13
>     1 KSP unpreconditioned resid norm 1.413505562549e-13 true resid norm 1.413505562550e-13 ||r(i)||/||b|| 7.097609046588e-07
> iteration 2 KSP Residual norm 5.165934822520e-18
>     2 KSP unpreconditioned resid norm 5.165934822520e-18 true resid norm 5.165932973227e-18 ||r(i)||/||b|| 2.593960262787e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 2, SNES Function norm 1.69561e-16
> ################# step 3 #################
> iter = 0, SNES Function norm 0.00035615
> iteration 0 KSP Residual norm 3.561504844171e-04
>     0 KSP unpreconditioned resid norm 3.561504844171e-04 true resid norm 3.561504844171e-04 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 3.701591890269e-09
>     1 KSP unpreconditioned resid norm 3.701591890269e-09 true resid norm 3.701591890274e-09 ||r(i)||/||b|| 1.039333667153e-05
> iteration 2 KSP Residual norm 7.832821034843e-15
>     2 KSP unpreconditioned resid norm 7.832821034843e-15 true resid norm 7.832856926692e-15 ||r(i)||/||b|| 2.199311041093e-11
>   Linear solve converged due to CONVERGED_RTOL iterations 2
> iter = 1, SNES Function norm 1.64671e-09
> iteration 0 KSP Residual norm 1.646709543241e-09
>     0 KSP unpreconditioned resid norm 1.646709543241e-09 true resid norm 1.646709543241e-09 ||r(i)||/||b|| 1.000000000000e+00
> iteration 1 KSP Residual norm 1.043230469512e-15
>     1 KSP unpreconditioned resid norm 1.043230469512e-15 true resid norm 1.043230469512e-15 ||r(i)||/||b|| 6.335242749968e-07
> iteration 1 KSP Residual norm 0.000000000000e+00
>     1 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm           -nan ||r(i)||/||b||           -nan
>   Linear solve did not converge due to DIVERGED_PC_FAILED iterations 1
>                  PC_FAILED due to SUBPC_ERROR
> 
> 
> More information from -ksp_error_if_not_converged -info
> 
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.303168180659e-07 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 7.795816360977e-02 at iteration 12
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.227610512466e+00 is less than relative tolerance 1.000000000000e-05 times initial right hand side norm 5.453050347652e+05 at iteration 12
> [0] PCSetUp(): Leaving PC with identical preconditioner since operator is unchanged
> [0] KSPConvergedDefault(): Linear solver is diverging. Initial right hand size norm 9.501675075823e-01, current residual norm 4.894880836662e+04 at iteration 210
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR:  
> [0]PETSC ERROR: KSPSolve has not converged, reason DIVERGED_DTOL
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/2?redirect=https%3A%2F%2Fwww.mcs.anl.gov%2Fpetsc%2Fdocumentation%2Ffaq.html&recipient=YnNtaXRoQHBldHNjLmRldg%3D%3D> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.12.5, Mar, 29, 2020
> [0]PETSC ERROR: ./stokeTutorial on a arch-linux2-c-debug named a2aa8f1c96aa by Unknown Fri Oct  9 13:43:28 2020
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-cuda
> [0]PETSC ERROR: #1 KSPSolve() line 832 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #2 PCApply_FieldSplit_Schur() line 1189 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/impls/fieldsplit/fieldsplit.c
> [0]PETSC ERROR: #3 PCApply() line 444 in /usr/local/petsc/petsc-3.12.5/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #4 KSP_PCApply() line 281 in /usr/local/petsc/petsc-3.12.5/include/petsc/private/kspimpl.h
> [0]PETSC ERROR: #5 KSPFGMRESCycle() line 166 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
> [0]PETSC ERROR: #6 KSPSolve_FGMRES() line 291 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
> [0]PETSC ERROR: #7 KSPSolve() line 760 in /usr/local/petsc/petsc-3.12.5/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /usr/local/petsc/petsc-3.12.5/src/snes/impls/ls/ls.c
> [0]PETSC ERROR: #9 SNESSolve() line 4482 in /usr/local/petsc/petsc-3.12.5/src/snes/interface/snes.c
> [0] PetscCommDuplicate(): Using internal PETSc communicator 1140850688 -2080374783
> SNES Object: 1 MPI processes
>   type: newtonls
>   maximum iterations=50, maximum function evaluations=10000
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=2
>   total number of function evaluations=1322
>   norm schedule ALWAYS
>   Jacobian is built using finite differences one column at a time
>   SNESLineSearch Object: 1 MPI processes
>     type: bt
>       interpolation: cubic
>       alpha=1.000000e-04
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
>     maximum iterations=40
>   KSP Object: 1 MPI processes
>     type: fgmres
>       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>     right preconditioning
>     using UNPRECONDITIONED norm type for convergence test
>   PC Object: 1 MPI processes
>     type: fieldsplit
>       FieldSplit with Schur preconditioner, blocksize = 1, factorization FULL
>       Preconditioner for the Schur complement formed from S itself
>       Split info:
>       Split number 0 Defined by IS
>       Split number 1 Defined by IS
>       KSP solver for A00 block
>         KSP Object: (fieldsplit_0_) 1 MPI processes
>           type: gmres
>             restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>             happy breakdown tolerance 1e-30
>           maximum iterations=10000, initial guess is zero
>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>           left preconditioning
>           using PRECONDITIONED norm type for convergence test
>         PC Object: (fieldsplit_0_) 1 MPI processes
>           type: ilu
>             out-of-place factorization
>             0 levels of fill
>             tolerance for zero pivot 2.22045e-14
>             matrix ordering: natural
>             factor fill ratio given 1., needed 1.
>               Factored matrix follows:
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=512, cols=512
>                   package used to perform factorization: petsc
>                   total: nonzeros=9213, allocated nonzeros=9213
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>           linear system matrix = precond matrix:
>           Mat Object: 1 MPI processes
>             type: seqaij
>             rows=512, cols=512
>             total: nonzeros=9213, allocated nonzeros=9213
>             total number of mallocs used during MatSetValues calls=0
>               not using I-node routines
>       KSP solver for S = A11 - A10 inv(A00) A01
>         KSP Object: (fieldsplit_1_) 1 MPI processes
>           type: gmres
>             restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>             happy breakdown tolerance 1e-30
>           maximum iterations=10000, initial guess is zero
>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>           left preconditioning
>           using PRECONDITIONED norm type for convergence test
>         PC Object: (fieldsplit_1_) 1 MPI processes
>           type: none
>           linear system matrix = precond matrix:
>           Mat Object: (fieldsplit_1_) 1 MPI processes
>             type: schurcomplement
>             rows=147, cols=147
>               Schur complement A11 - A10 inv(A00) A01
>               A11
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=147, cols=147
>                   total: nonzeros=147, allocated nonzeros=147
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>               A10
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=147, cols=512
>                   total: nonzeros=2560, allocated nonzeros=2560
>                   total number of mallocs used during MatSetValues calls=0
>                     using I-node routines: found 87 nodes, limit used is 5
>               KSP of A00
>                 KSP Object: (fieldsplit_0_) 1 MPI processes
>                   type: gmres
>                     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>                     happy breakdown tolerance 1e-30
>                   maximum iterations=10000, initial guess is zero
>                   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>                   left preconditioning
>                   using PRECONDITIONED norm type for convergence test
>                 PC Object: (fieldsplit_0_) 1 MPI processes
>                   type: ilu
>                     out-of-place factorization
>                     0 levels of fill
>                     tolerance for zero pivot 2.22045e-14
>                     matrix ordering: natural
>                     factor fill ratio given 1., needed 1.
>                       Factored matrix follows:
>                         Mat Object: 1 MPI processes
>                           type: seqaij
>                           rows=512, cols=512
>                           package used to perform factorization: petsc
>                           total: nonzeros=9213, allocated nonzeros=9213
>                           total number of mallocs used during MatSetValues calls=0
>                             not using I-node routines
>                   linear system matrix = precond matrix:
>                   Mat Object: 1 MPI processes
>                     type: seqaij
>                     rows=512, cols=512
>                     total: nonzeros=9213, allocated nonzeros=9213
>                     total number of mallocs used during MatSetValues calls=0
>                       not using I-node routines
>               A01
>                 Mat Object: 1 MPI processes
>                   type: seqaij
>                   rows=512, cols=147
>                   total: nonzeros=2562, allocated nonzeros=2562
>                   total number of mallocs used during MatSetValues calls=0
>                     not using I-node routines
>     linear system matrix = precond matrix:
>     Mat Object: 1 MPI processes
>       type: seqaij
>       rows=659, cols=659
>       total: nonzeros=14482, allocated nonzeros=27543
>       total number of mallocs used during MatSetValues calls=1309
>         not using I-node routines
> 
> 
> 
> On Oct 9 2020, at 2:17 am, Barry Smith <bsmith at petsc.dev <https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/3?redirect=mailto%3Absmith%40petsc.dev&recipient=YnNtaXRoQHBldHNjLmRldg%3D%3D>> wrote:
> 
>   When you get a huge change at restart this means something is seriously wrong with either the linear operator or the linear preconditioner. 
> 
>   How are you doing the matrix vector product?   Note both the operator and preconditioner must be linear operators for GMRES.
> 
>   FGMRES allows the preconditioner to be nonlinear. You can try
> 
>   -ksp_type fgmres -ksp_monitor_true_residual
> 
>    Barry
> 
> 
> On Oct 8, 2020, at 2:43 AM, Yang Juntao <Y.Juntao at hotmail.com <https://link.getmailspring.com/link/FB54F843-9802-454D-9332-7D5C41E03F86 at getmailspring.com/4?redirect=https%3A%2F%2Flink.getmailspring.com%2Flink%2FAA5B8AA2-FFAB-403A-A231-10CE902ACB44%40getmailspring.com%2F0%3Fredirect%3Dmailto%253AY.Juntao%2540hotmail.com%26recipient%3DYnNtaXRoQHBldHNjLmRldg%253D%253D&recipient=YnNtaXRoQHBldHNjLmRldg%3D%3D>> wrote:
> 
> Hello, 
>  
> I?m working on a nonlinear solver with SNES with handcoded jacobian and function. Each linear solver is solved with KSP solver.
> But sometimes I got issues with ksp solver convergence. I tried with finite difference approximated jacobian, but get the same error.
>  
> From the iterations, the convergence seems ok at the beginning but suddenly diverged in the last iteration.
> Hope anyone with experience on ksp solvers could direct me to a direction I can debug the problem.
>  
> iter = 0, SNES Function norm 2.94934e-06
> iteration 0 KSP Residual norm 1.094600281831e-06
> iteration 1 KSP Residual norm 1.264284474186e-08
> iteration 2 KSP Residual norm 6.593269221816e-09
> iteration 3 KSP Residual norm 1.689570779457e-09
> iteration 4 KSP Residual norm 1.040661505932e-09
> iteration 5 KSP Residual norm 5.422761817348e-10
> iteration 6 KSP Residual norm 2.492867371369e-10
> iteration 7 KSP Residual norm 8.261522376775e-11
> iteration 8 KSP Residual norm 4.246401544245e-11
> iteration 9 KSP Residual norm 2.514366787388e-11
> iteration 10 KSP Residual norm 1.982940267051e-11
> iteration 11 KSP Residual norm 1.586470414676e-11
> iteration 12 KSP Residual norm 9.866392216207e-12
> iteration 13 KSP Residual norm 4.951342176999e-12
> iteration 14 KSP Residual norm 2.418292660318e-12
> iteration 15 KSP Residual norm 1.747418526086e-12
> iteration 16 KSP Residual norm 1.094150535809e-12
> iteration 17 KSP Residual norm 4.464287492066e-13
> iteration 18 KSP Residual norm 3.530090494462e-13
> iteration 19 KSP Residual norm 2.825698091454e-13
> iteration 20 KSP Residual norm 1.950568425807e-13
> iteration 21 KSP Residual norm 1.227898091813e-13
> iteration 22 KSP Residual norm 5.411106347374e-14
> iteration 23 KSP Residual norm 4.511115848564e-14
> iteration 24 KSP Residual norm 4.063546606691e-14
> iteration 25 KSP Residual norm 3.677694771949e-14
> iteration 26 KSP Residual norm 3.459244943466e-14
> iteration 27 KSP Residual norm 3.263954971093e-14
> iteration 28 KSP Residual norm 3.087344619079e-14
> iteration 29 KSP Residual norm 2.809426925625e-14
> iteration 30 KSP Residual norm 4.366149884754e-01
>   Linear solve did not converge due to DIVERGED_DTOL iterations 30
>  
>  
> SNES Object: 1 MPI processes
>   type: newtonls
>   SNES has not been set up so information may be incomplete
>   maximum iterations=50, maximum function evaluations=10000
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=0
>   norm schedule ALWAYS
>   SNESLineSearch Object: 1 MPI processes
>     type: bt
>       interpolation: cubic
>       alpha=1.000000e-04
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
>     maximum iterations=40
>   KSP Object: 1 MPI processes
>     type: gmres
>       restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>       happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>     left preconditioning
>     using DEFAULT norm type for convergence test
>   PC Object: 1 MPI processes
>     type: fieldsplit
>     PC has not been set up so information may be incomplete
>       FieldSplit with Schur preconditioner, factorization FULL
>       Preconditioner for the Schur complement formed from S itself
>       Split info:
>       KSP solver for A00 block
>           not yet available
>       KSP solver for S = A11 - A10 inv(A00) A01
>           not yet available
>     linear system matrix = precond matrix:
>     Mat Object: 1 MPI processes
>       type: seqaij
>       rows=659, cols=659
>       total: nonzeros=659, allocated nonzeros=7908
>       total number of mallocs used during MatSetValues calls=0
>         not using I-node routines
>  
> Regards
> Juntao
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201011/1761fda6/attachment-0001.html>

From olivier.jamond at cea.fr  Mon Oct 12 06:10:02 2020
From: olivier.jamond at cea.fr (Olivier Jamond)
Date: Mon, 12 Oct 2020 13:10:02 +0200
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
	<358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
	<7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr>
	<87tuv48osv.fsf@jedbrown.org>
	<2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev>
	<218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev>
Message-ID: <e4b6729d-4b0f-6d04-485f-8a437af5a807@cea.fr>

Hi Barry,

Thanks for this work! I tried this branch with my code and sequential 
matrices on a small case: it does work!

Thanks a lot,
Olivier

On 09/10/2020 03:50, Barry Smith wrote:
>
> ? Olivier,
>
> ? ? The branch *barry/2020-10-08/invert-block-diagonal-aij*?contains 
> an example src/mat/tests/ex178.c that shows how to compute inv(CC'). 
> It works for SeqAIJ matrices.
>
> ? ? Please let us know if it works for you and then I will implement 
> the parallel version.
>
> ? Barry
>
>
>> On Oct 8, 2020, at 3:59 PM, Barry Smith <bsmith at petsc.dev 
>> <mailto:bsmith at petsc.dev>> wrote:
>>
>>
>> ?Olivier
>>
>> ?I am working on extending the routines now and hopefully push a 
>> branch you can try fairly soon.
>>
>> ?Barry
>>
>>
>>> On Oct 8, 2020, at 3:07 PM, Jed Brown <jed at jedbrown.org 
>>> <mailto:jed at jedbrown.org>> wrote:
>>>
>>> Olivier Jamond <olivier.jamond at cea.fr 
>>> <mailto:olivier.jamond at cea.fr>> writes:
>>>
>>>>> ??Given the structure of C it seems you should just explicitly 
>>>>> construct Sp and use GAMG (or other preconditioners, even a direct 
>>>>> solver) directly on Sp. Trying to avoid explicitly forming Sp will 
>>>>> give you a much slower performing solving for what benefit? If C 
>>>>> was just some generic monster than forming Sp might be unrealistic 
>>>>> but in your case CCt is is block diagonal with tiny blocks which 
>>>>> means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks 
>>>>> are the inverses of the blocks of (C*Ct)).
>>>>>
>>>>> ???Sp = Ct*C ?+ Qt * S * Q = Ct*C ?+ ?[I - Ct * (C*Ct)^(-1)*C] S 
>>>>> [I - Ct * (C*Ct)^(-1)*C]
>>>>>
>>>>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly 
>>>>> larger blocks.
>>>>>
>>>>> You can do D = (C*Ct) with MatMatMult() then write custom code 
>>>>> that zips through the diagonal blocks of D inverting all of them 
>>>>> to get iD then use MatPtAP applied to C and iD to get Ct * 
>>>>> (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or 
>>>>> MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] 
>>>>> ?then finally MatAXPY() to get Sp. The complexity of each of the 
>>>>> Mat operations is very low because of the absurdly simple 
>>>>> structure of C and its descendants. ??You might even be able to 
>>>>> just use MUMPS to give you the explicit inv(C*Ct) without writing 
>>>>> custom code to get iD.
>>>>
>>>> At this time, I didn't manage to compute iD=inv(C*Ct) without using
>>>> dense matrices, what may be a shame because all matrices are sparse 
>>>> . Is
>>>> it possible?
>>>>
>>>> And I get no idea of how to write code to manually zip through the
>>>> diagonal blocks of D to invert them...
>>>
>>> You could use MatInvertVariableBlockDiagonal(), which should perhaps 
>>> return a Mat instead of a raw array.
>>>
>>> If you have constant block sizes, MatInvertBlockDiagonalMat will 
>>> return a Mat.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201012/fd395c29/attachment.html>

From bsmith at petsc.dev  Mon Oct 12 09:49:12 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 12 Oct 2020 09:49:12 -0500
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <e4b6729d-4b0f-6d04-485f-8a437af5a807@cea.fr>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
	<358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
	<7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr>
	<87tuv48osv.fsf@jedbrown.org>
	<2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev>
	<218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev>
	<e4b6729d-4b0f-6d04-485f-8a437af5a807@cea.fr>
Message-ID: <DF5D8D7C-8DD9-4E1A-8617-36A5FF472FA3@petsc.dev>


> On Oct 12, 2020, at 6:10 AM, Olivier Jamond <olivier.jamond at cea.fr> wrote:
> 
> Hi Barry,
> 
> Thanks for this work! I tried this branch with my code and sequential matrices on a small case: it does work!
> 
> 
  Excellant. I will extend it to the parallel case and get it into our master release. 

  We'd be interested in hearing about your convergence and timing experiences when you run largish jobs (even sequentially) since this type of problem comes up relatively frequently and we do need a variety of solvers that can handle it while currently we do not have great tools for it.

   Barry

> Thanks a lot,
> Olivier
> 
> On 09/10/2020 03:50, Barry Smith wrote:
>> 
>>   Olivier,
>> 
>>     The branch barry/2020-10-08/invert-block-diagonal-aij contains an example src/mat/tests/ex178.c that shows how to compute inv(CC'). It works for SeqAIJ matrices.
>> 
>>     Please let us know if it works for you and then I will implement the parallel version.
>> 
>>   Barry
>> 
>> 
>>> On Oct 8, 2020, at 3:59 PM, Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>> 
>>> 
>>>  Olivier
>>> 
>>>  I am working on extending the routines now and hopefully push a branch you can try fairly soon.
>>> 
>>>  Barry
>>> 
>>> 
>>>> On Oct 8, 2020, at 3:07 PM, Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>>>> 
>>>> Olivier Jamond <olivier.jamond at cea.fr <mailto:olivier.jamond at cea.fr>> writes:
>>>> 
>>>>>>   Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)).
>>>>>> 
>>>>>>    Sp = Ct*C  + Qt * S * Q = Ct*C  +  [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]
>>>>>> 
>>>>>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks.
>>>>>> 
>>>>>> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C]  then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants.   You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD.
>>>>> 
>>>>> At this time, I didn't manage to compute iD=inv(C*Ct) without using 
>>>>> dense matrices, what may be a shame because all matrices are sparse . Is 
>>>>> it possible?
>>>>> 
>>>>> And I get no idea of how to write code to manually zip through the 
>>>>> diagonal blocks of D to invert them...
>>>> 
>>>> You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array.
>>>> 
>>>> If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat.
>>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201012/6d30f871/attachment.html>

From t.appel17 at imperial.ac.uk  Tue Oct 13 07:47:57 2020
From: t.appel17 at imperial.ac.uk (Thibaut Appel)
Date: Tue, 13 Oct 2020 14:47:57 +0200
Subject: [petsc-users] About MAT_NEW_NONZERO_LOCATION[]
Message-ID: <0dd90b52-ca23-8197-3ac4-84844054fdda@imperial.ac.uk>

Hi there, just a quick question:

It seems MAT_NEW_NONZERO_LOCATION_ERR set to PETSC_TRUE has kind of the 
same purpose as MAT_NEW_NONZERO_LOCATIONS set to PETSC_FALSE, the 
difference being if an additional entry is there, the former produces an 
error whereas in the latter it is simply ignored.

However the manual states:

'If one wishes to repeatedly assemble matrices that retain the same 
nonzero pattern (such as within a nonlinear or time-dependent problem), 
the option MatSetOption(MatA,*MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE*); 
should be specified after the first matrix has been fully assembled. 
This option ensures that certain data structures and communication 
information will be reused (instead of regenerated) during successive 
steps, thereby increasing efficiency'

If I only declare:

 ??? CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)

Would the data structures still be reused in later matrix assemblies?

Or does it rather make sense to use conjointly:

 ??? CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)
 ??? CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ierr)

Thank you,


Thibaut

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201013/26343721/attachment.html>

From bsmith at petsc.dev  Tue Oct 13 09:41:37 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 13 Oct 2020 09:41:37 -0500
Subject: [petsc-users] About MAT_NEW_NONZERO_LOCATION[]
In-Reply-To: <0dd90b52-ca23-8197-3ac4-84844054fdda@imperial.ac.uk>
References: <0dd90b52-ca23-8197-3ac4-84844054fdda@imperial.ac.uk>
Message-ID: <57AB9DEE-AF44-43DD-8F76-A9205E9D418A@petsc.dev>


  You only need to provide one of the options. 

   The docs are slightly misleading.The flags only tells the matrix what to do with new nonzero locations, preventing new ones. The Mat actually tracks if new non-zeros locations are actually entered independent of the flags. So, for example even if you did not supply any new flags AND your code did not insert new locations then the structure would be reused.

   Barry

> On Oct 13, 2020, at 7:47 AM, Thibaut Appel <t.appel17 at imperial.ac.uk> wrote:
> 
> Hi there, just a quick question:
> 
> It seems MAT_NEW_NONZERO_LOCATION_ERR set to PETSC_TRUE has kind of the same purpose as MAT_NEW_NONZERO_LOCATIONS set to PETSC_FALSE, the difference being if an additional entry is there, the former produces an error whereas in the latter it is simply ignored.
> 
> However the manual states:
> 
> 'If one wishes to repeatedly assemble matrices that retain the same nonzero pattern (such as within a nonlinear or time-dependent problem), the option MatSetOption(MatA,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); should be specified after the first matrix has been fully assembled. This option ensures that certain data structures and communication information will be reused (instead of regenerated) during successive steps, thereby increasing efficiency'
> 
> If I only declare:
> 
>     CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)
> 
> Would the data structures still be reused in later matrix assemblies?
> 
> Or does it rather make sense to use conjointly:
> 
>     CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)
>     CALL MatSetOption(MatA,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE,ierr)
> 
> Thank you,
> 
> 
> 
> Thibaut
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201013/cfa85631/attachment.html>

From nicolas.barral at math.u-bordeaux.fr  Wed Oct 14 05:00:34 2020
From: nicolas.barral at math.u-bordeaux.fr (Nicolas Barral)
Date: Wed, 14 Oct 2020 12:00:34 +0200
Subject: [petsc-users] Python version needed for internal scripts
Message-ID: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>

Hi all,

Apologies if the question has already been asked, but the ML archive 
search seems to be broken (or has it never worked ?).

Many petsc scripts require a 'python' executable, which python should 
that be ? For now, python3 seems to have worked with the configure 
scripts and petsc_gen_xdmf scripts, but can I safely assume it will 
always be the case ?

'python' is usually an alias for python2, so making it point at python3 
seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and 
maybe others, and if I have no python2 installed, and no 'python' alias, 
I have to manually edit all the scripts.

Thanks

-- 
Nicolas

From knepley at gmail.com  Wed Oct 14 05:20:13 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 14 Oct 2020 06:20:13 -0400
Subject: [petsc-users] Python version needed for internal scripts
In-Reply-To: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
Message-ID: <CAMYG4GkMxSAEmjtdZpAwGOF0T-sZO+hdYscqq+gCzLWeQiHz=w@mail.gmail.com>

On Wed, Oct 14, 2020 at 6:01 AM Nicolas Barral <
nicolas.barral at math.u-bordeaux.fr> wrote:

> Hi all,
>
> Apologies if the question has already been asked, but the ML archive
> search seems to be broken (or has it never worked ?).
>
> Many petsc scripts require a 'python' executable, which python should
> that be ? For now, python3 seems to have worked with the configure
> scripts and petsc_gen_xdmf scripts, but can I safely assume it will
> always be the case ?
>
> 'python' is usually an alias for python2, so making it point at python3
> seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and
> maybe others, and if I have no python2 installed, and no 'python' alias,
> I have to manually edit all the scripts.
>

Right now, PETSc works with both Python2 and Python3. I am not sure how
long we can support Python2,
but the aim is to support it until End of Life, probably on Red Hat since
they change the slowest I think.

  Thanks,

    Matt


> Thanks
>
> --
> Nicolas
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201014/c1cd709e/attachment.html>

From pierre at joliv.et  Wed Oct 14 05:26:26 2020
From: pierre at joliv.et (Pierre Jolivet)
Date: Wed, 14 Oct 2020 12:26:26 +0200
Subject: [petsc-users] Python version needed for internal scripts
In-Reply-To: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
Message-ID: <CD40C4B0-EE9F-4F36-8D2E-50F56C1D4DFE@joliv.et>

Hello Nicolas,

> On 14 Oct 2020, at 12:00 PM, Nicolas Barral <nicolas.barral at math.u-bordeaux.fr> wrote:
> 
> Hi all,
> 
> Apologies if the question has already been asked, but the ML archive search seems to be broken (or has it never worked ?).

What do you mean broken? Can?t you access this URL https://lists.mcs.anl.gov/pipermail/petsc-dev/ <https://lists.mcs.anl.gov/pipermail/petsc-dev/> ?
I know that some French ISP are getting blocked by the ANL firewall, but I don?t think I've ever had this issue with the ML.

> Many petsc scripts require a 'python' executable, which python should that be ? For now, python3 seems to have worked with the configure scripts and petsc_gen_xdmf scripts, but can I safely assume it will always be the case ?

This was discussed in great length in the ML when 3.13.5 (or was it 3.13.4?) was released (because it was kind of broken at first), see https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2020-April/025879.html <https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2020-April/025879.html> + https://gitlab.com/petsc/petsc/-/merge_requests/2818 <https://gitlab.com/petsc/petsc/-/merge_requests/2818>
This was then patched some more https://gitlab.com/petsc/petsc/-/merge_requests/2831 <https://gitlab.com/petsc/petsc/-/merge_requests/2831> because some Python versions are not shipping distutils.sysconfig which is mandatory for the configure to go through.

I think you can assume that a correct Python version will be picked up for you by configure. If this is not the case, we will need to fix this (by adding another check?).

Thanks,
Pierre

> 'python' is usually an alias for python2, so making it point at python3 seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and maybe others, and if I have no python2 installed, and no 'python' alias, I have to manually edit all the scripts.
> 
> Thanks
> 
> -- 
> Nicolas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201014/8c13a33c/attachment.html>

From dalcinl at gmail.com  Wed Oct 14 06:12:02 2020
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Wed, 14 Oct 2020 14:12:02 +0300
Subject: [petsc-users] Python version needed for internal scripts
In-Reply-To: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
Message-ID: <CAEcYPwC+fwf90PN80t=Cjsbz4zh66ZdR3VZ-aG+-e4_Sj0L5TQ@mail.gmail.com>

On Wed, 14 Oct 2020 at 13:01, Nicolas Barral <
nicolas.barral at math.u-bordeaux.fr> wrote:

>
> 'python' is usually an alias for python2, so making it point at python3
> seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and
> maybe others, and if I have no python2 installed, and no 'python' alias,
> I have to manually edit all the scripts.
>
>
apt install python-is-python3

and you should get the alias python -> python3 in /usr/bin

Disclaimer: I'm not an Ubuntu user. Google and inform yourself.

-- 
Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201014/e946a228/attachment.html>

From dalcinl at gmail.com  Wed Oct 14 06:58:42 2020
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Wed, 14 Oct 2020 14:58:42 +0300
Subject: [petsc-users] Python version needed for internal scripts
In-Reply-To: <CAEcYPwC+fwf90PN80t=Cjsbz4zh66ZdR3VZ-aG+-e4_Sj0L5TQ@mail.gmail.com>
References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
	<CAEcYPwC+fwf90PN80t=Cjsbz4zh66ZdR3VZ-aG+-e4_Sj0L5TQ@mail.gmail.com>
Message-ID: <CAEcYPwDLp1Va8k9N0DX+uc6LBJ7qtRN1PN0hNpiY-87LUwAbEw@mail.gmail.com>

On Wed, 14 Oct 2020 at 14:12, Lisandro Dalcin <dalcinl at gmail.com> wrote:

> Disclaimer: I'm not an Ubuntu user. Google and inform yourself.
>

Just in case... What I'm trying to say is that I do not know Ubuntu very
well, much less its  Py2 -> Py3 transition details, and then you should try
to find a more authoritative source than my occasional comments about the
proper way to do things.


-- 
Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201014/50805054/attachment.html>

From jed at jedbrown.org  Wed Oct 14 08:40:43 2020
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 14 Oct 2020 07:40:43 -0600
Subject: [petsc-users] Python version needed for internal scripts
In-Reply-To: <CAEcYPwC+fwf90PN80t=Cjsbz4zh66ZdR3VZ-aG+-e4_Sj0L5TQ@mail.gmail.com>
References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
	<CAEcYPwC+fwf90PN80t=Cjsbz4zh66ZdR3VZ-aG+-e4_Sj0L5TQ@mail.gmail.com>
Message-ID: <87h7qxc4dg.fsf@jedbrown.org>

Lisandro Dalcin <dalcinl at gmail.com> writes:

> On Wed, 14 Oct 2020 at 13:01, Nicolas Barral <
> nicolas.barral at math.u-bordeaux.fr> wrote:
>
>>
>> 'python' is usually an alias for python2, so making it point at python3
>> seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and
>> maybe others, and if I have no python2 installed, and no 'python' alias,
>> I have to manually edit all the scripts.
>
> apt install python-is-python3
>
> and you should get the alias python -> python3 in /usr/bin

Most scripts are called through the make system, which uses $(PYTHON).  You can always do that:

  /your/preferred/python petsc_gen_xdmf.py thefile.h5


configure is actually a polyglot shell script that figures out how to call Python on itself.  We could do that for other essential scripts.  I hate the workflow of petsc_gen_xdmf.py, but it could be done this way.


My /bin/python (on Arch) has been Python-3 since 2011.

From pranayreddy865 at gmail.com  Wed Oct 14 13:59:02 2020
From: pranayreddy865 at gmail.com (baikadi pranay)
Date: Wed, 14 Oct 2020 11:59:02 -0700
Subject: [petsc-users] Object is in wrong state,
 Matrix is missing diagonal entry 0
Message-ID: <CA+zFCTmkscW+-J_m3-WjrQ6Bsi8LHtPsfmiWFkwS4_oN0Q_gmw@mail.gmail.com>

Hello everyone,
I am trying to solve the Poisson equation using SNES class. I am running
into a problem where PETSc complains that an object is in wrong state. I
opened the matrix object in matlab to see if any diagonal entry is missing
but I see that it is not the case. Could you please let me know what I am
missing here?  The complete error is as follows:

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Matrix is missing diagonal entry 0
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
[0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-9.agave.rc.asu.edu
by pbaikadi Wed Oct 14 11:33:45 2020
[0]PETSC ERROR: Configure options
[0]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1687 in
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/impls/aij/seq/aijfact.c
[0]PETSC ERROR: #2 MatILUFactorSymbolic() line 6737 in
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/interface/matrix.c
[0]PETSC ERROR: #3 PCSetUp_ILU() line 144 in
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/impls/factor/ilu/ilu.c
[0]PETSC ERROR: #4 PCSetUp() line 932 in
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #5 KSPSetUp() line 391 in
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #6 KSPSolve() line 725 in
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 225 in
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/impls/ls/ls.c
[0]PETSC ERROR: #8 SNESSolve() line 4560 in
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/interface/snes.c
[0]PETSC ERROR: #9 User provided function() line 0 in User file

On a different note, I have two more questions.
1) When a matrix is being filled using MatSetValues, does the nnz vector
also have a 0-based indexing?
2) If I explicitly have nnz 1-based indexing, then does nnz(i) indicate the
number of nonzeros in the (i-1)^th row or the i^th row?

Thank you in advance for your help.
Best Regards,
Pranay.
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201014/8c9448e2/attachment.html>

From knepley at gmail.com  Wed Oct 14 14:53:47 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 14 Oct 2020 15:53:47 -0400
Subject: [petsc-users] Object is in wrong state,
 Matrix is missing diagonal entry 0
In-Reply-To: <CA+zFCTmkscW+-J_m3-WjrQ6Bsi8LHtPsfmiWFkwS4_oN0Q_gmw@mail.gmail.com>
References: <CA+zFCTmkscW+-J_m3-WjrQ6Bsi8LHtPsfmiWFkwS4_oN0Q_gmw@mail.gmail.com>
Message-ID: <CAMYG4G=ORxqT=XT7POXnva9Juk34==fxoPdCq65XckmJcv=jcw@mail.gmail.com>

On Wed, Oct 14, 2020 at 2:59 PM baikadi pranay <pranayreddy865 at gmail.com>
wrote:

> Hello everyone,
> I am trying to solve the Poisson equation using SNES class. I am running
> into a problem where PETSc complains that an object is in wrong state. I
> opened the matrix object in matlab to see if any diagonal entry is missing
> but I see that it is not the case. Could you please let me know what I am
> missing here?  The complete error is as follows:
>

You are missing the diagonal entry. You can look at the entire structure
using

  MatView(A, PETSC_VIEWER_STDOUT_WORLD);


> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR: Matrix is missing diagonal entry 0
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
> [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named
> cg17-9.agave.rc.asu.edu by pbaikadi Wed Oct 14 11:33:45 2020
> [0]PETSC ERROR: Configure options
> [0]PETSC ERROR: #1 MatILUFactorSymbolic_SeqAIJ() line 1687 in
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/impls/aij/seq/aijfact.c
> [0]PETSC ERROR: #2 MatILUFactorSymbolic() line 6737 in
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/mat/interface/matrix.c
> [0]PETSC ERROR: #3 PCSetUp_ILU() line 144 in
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/impls/factor/ilu/ilu.c
> [0]PETSC ERROR: #4 PCSetUp() line 932 in
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #5 KSPSetUp() line 391 in
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #6 KSPSolve() line 725 in
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #7 SNESSolve_NEWTONLS() line 225 in
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/impls/ls/ls.c
> [0]PETSC ERROR: #8 SNESSolve() line 4560 in
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/snes/interface/snes.c
> [0]PETSC ERROR: #9 User provided function() line 0 in User file
>
> On a different note, I have two more questions.
> 1) When a matrix is being filled using MatSetValues, does the nnz vector
> also have a 0-based indexing?
>

We always use 0-based indices. However, MatSetValues() does not take an nnz
vector.


> 2) If I explicitly have nnz 1-based indexing, then does nnz(i) indicate
> the number of nonzeros in the (i-1)^th row or the i^th row?
>

I do not understand.

  Thanks,

     Matt


> Thank you in advance for your help.
> Best Regards,
> Pranay.
> ?
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201014/b6c65aa7/attachment.html>

From nicolas.barral at math.u-bordeaux.fr  Wed Oct 14 16:14:19 2020
From: nicolas.barral at math.u-bordeaux.fr (Nicolas Barral)
Date: Wed, 14 Oct 2020 23:14:19 +0200
Subject: [petsc-users] Python version needed for internal scripts
In-Reply-To: <CAMYG4GkMxSAEmjtdZpAwGOF0T-sZO+hdYscqq+gCzLWeQiHz=w@mail.gmail.com>
References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
	<CAMYG4GkMxSAEmjtdZpAwGOF0T-sZO+hdYscqq+gCzLWeQiHz=w@mail.gmail.com>
Message-ID: <40342694-dbaf-0e2a-5b2a-3a4858cede09@math.u-bordeaux.fr>

Thanks Matt, Pierre, Lisandro and Jed for your help.

Does the python version chosen to call the configure script impact other 
petsc scripts ?

For now keeping python as an alias for python2 seems safer (until proven 
otherwise) due to other codes.

@Pierre: I meant the search button in https://lists.mcs.anl.gov/... 
wouldn't return anything, even on as obvious as queries as "petsc". It 
does work now, so not sure what happened.

Thanks,

-- 
Nicolas


On 14/10/2020 12:20, Matthew Knepley wrote:
> On Wed, Oct 14, 2020 at 6:01 AM Nicolas Barral 
> <nicolas.barral at math.u-bordeaux.fr 
> <mailto:nicolas.barral at math.u-bordeaux.fr>> wrote:
> 
>     Hi all,
> 
>     Apologies if the question has already been asked, but the ML archive
>     search seems to be broken (or has it never worked ?).
> 
>     Many petsc scripts require a 'python' executable, which python should
>     that be ? For now, python3 seems to have worked with the configure
>     scripts and petsc_gen_xdmf scripts, but can I safely assume it will
>     always be the case ?
> 
>     'python' is usually an alias for python2, so making it point at python3
>     seems a bit dangerous. Yet, python2 was removed from recent Ubuntus and
>     maybe others, and if I have no python2 installed, and no 'python'
>     alias,
>     I have to manually edit all the scripts.
> 
> 
> Right now, PETSc works with both Python2 and Python3. I am not sure how 
> long we can support Python2,
> but the aim is to support it until End of Life, probably on Red Hat 
> since they change the slowest I think.
> 
>  ? Thanks,
> 
>  ? ? Matt
> 
>     Thanks
> 
>     -- 
>     Nicolas
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From jed at jedbrown.org  Wed Oct 14 17:31:01 2020
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 14 Oct 2020 16:31:01 -0600
Subject: [petsc-users] Python version needed for internal scripts
In-Reply-To: <40342694-dbaf-0e2a-5b2a-3a4858cede09@math.u-bordeaux.fr>
References: <2ac76631-27fa-1870-322d-8a7ab23af1e3@math.u-bordeaux.fr>
	<CAMYG4GkMxSAEmjtdZpAwGOF0T-sZO+hdYscqq+gCzLWeQiHz=w@mail.gmail.com>
	<40342694-dbaf-0e2a-5b2a-3a4858cede09@math.u-bordeaux.fr>
Message-ID: <87wnzsbftm.fsf@jedbrown.org>

Nicolas Barral <nicolas.barral at math.u-bordeaux.fr> writes:

> Thanks Matt, Pierre, Lisandro and Jed for your help.
>
> Does the python version chosen to call the configure script impact other 
> petsc scripts ?

It affects everything called through make targets, which use $(PYTHON) and includes most things a user would care about.

petsc_gen_xdmf.py is special and I hate it, but proper Python "install" of this script is a hassle.

> For now keeping python as an alias for python2 seems safer (until proven 
> otherwise) due to other codes.

There is no reason for people to have python2 installed unless they have to work with legacy Python that still hasn't migrated after EOL.

We test with Python-3 and folks should be encouraged to use it.

From olivier.jamond at cea.fr  Thu Oct 15 03:26:09 2020
From: olivier.jamond at cea.fr (Olivier Jamond)
Date: Thu, 15 Oct 2020 10:26:09 +0200
Subject: [petsc-users] Ainsworth formula to solve saddle point problems
 / preconditioner for shell matrices
In-Reply-To: <DF5D8D7C-8DD9-4E1A-8617-36A5FF472FA3@petsc.dev>
References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr>
	<87mu15u6kx.fsf@jedbrown.org>
	<5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr>
	<886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev>
	<b5bf1014-2ec2-2ef0-8a51-39c9771ebc6b@cea.fr>
	<358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev>
	<7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr>
	<87tuv48osv.fsf@jedbrown.org>
	<2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev>
	<218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev>
	<e4b6729d-4b0f-6d04-485f-8a437af5a807@cea.fr>
	<DF5D8D7C-8DD9-4E1A-8617-36A5FF472FA3@petsc.dev>
Message-ID: <5d2869c9-3f2b-bb40-c99a-7c47683a1420@cea.fr>

> ? We'd be interested in hearing about your convergence and timing 
> experiences when you run largish jobs (even sequentially) since this 
> type of problem comes up relatively frequently and we do need a 
> variety of solvers that can handle it while currently we do not have 
> great tools for it.

Of course I will keep you in touch with the results of this 'ainsworht' 
approach! I also plan to compare it with the new 'GKB' approach of 
PCFIELDSPLIT.

Many thanks,
Olivier


From hecbarcab at gmail.com  Thu Oct 15 04:16:46 2020
From: hecbarcab at gmail.com (=?UTF-8?Q?H=C3=A9ctor_Barreiro_Cabrera?=)
Date: Thu, 15 Oct 2020 11:16:46 +0200
Subject: [petsc-users] Eisenstat-Walker method with GPU assembled matrices
Message-ID: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>

Hello fellow PETSc users,

Following up my previous email
<https://lists.mcs.anl.gov/pipermail/petsc-users/2020-September/042511.html>,
I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA
kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry
Smith and Mark Adams!). However, I am now facing problems when trying to
use this matrix within a SNES solver with the Eisenstat-Walker method
enabled.

According to PETSc's error log, the preconditioner is failing to invert the
matrix diagonal. Specifically it says that:
[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Zero diagonal on row 0
[0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release
--with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++
--with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc"
--with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single
--with-64-bit-indices=0 --with-single-library=1 --with-endian=little
--with-debugging=0 --with-x=0 --with-windows-graphics=0
--with-shared-libraries=1 --CUDAOPTFLAGS=-O2

The stack trace leads to the diagonal inversion routine:
[0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
[0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
[0]PETSC ERROR: #3 MatSOR() line 4005 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c
[0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c
[0]PETSC ERROR: #5 PCPreSolve() line 1549 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c
[0]PETSC ERROR: #6 KSPSolve_Private() line 686 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
[0]PETSC ERROR: #7 KSPSolve() line 889 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
[0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c
[0]PETSC ERROR: #9 SNESSolve() line 4567 in
C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c

I am 100% positive that the diagonal does not contain a zero entry, so my
suspicions are either that this operation is not supported on the GPU at
all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I
am missing some setting to make this work on the GPU. Is this correct?

Thanks!

Cheers,
H?ctor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201015/9734d29d/attachment.html>

From jed at jedbrown.org  Thu Oct 15 08:10:15 2020
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 15 Oct 2020 07:10:15 -0600
Subject: [petsc-users] Eisenstat-Walker method with GPU assembled
 matrices
In-Reply-To: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>
References: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>
Message-ID: <87r1pzbpoo.fsf@jedbrown.org>

H?ctor Barreiro Cabrera <hecbarcab at gmail.com> writes:

> Hello fellow PETSc users,
>
> Following up my previous email
> <https://lists.mcs.anl.gov/pipermail/petsc-users/2020-September/042511.html>,
> I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA
> kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry
> Smith and Mark Adams!). However, I am now facing problems when trying to
> use this matrix within a SNES solver with the Eisenstat-Walker method
> enabled.

Before going further, the error message makes this look like you're using -pc_type eisenstat (a "trick" to reduce the cost of Gauss-Seidel with Krylov) rather than -snes_ksp_ew (the Eisenstat-Walker method for tuning linear solver tolerances within SNES).  Is this what you intend?

> According to PETSc's error log, the preconditioner is failing to invert the
> matrix diagonal. Specifically it says that:
> [0]PETSC ERROR: Arguments are incompatible
> [0]PETSC ERROR: Zero diagonal on row 0
> [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release
> --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++
> --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc"
> --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single
> --with-64-bit-indices=0 --with-single-library=1 --with-endian=little
> --with-debugging=0 --with-x=0 --with-windows-graphics=0
> --with-shared-libraries=1 --CUDAOPTFLAGS=-O2
>
> The stack trace leads to the diagonal inversion routine:
> [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
> [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
> [0]PETSC ERROR: #3 MatSOR() line 4005 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c
> [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c
> [0]PETSC ERROR: #5 PCPreSolve() line 1549 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c
> [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
> [0]PETSC ERROR: #7 KSPSolve() line 889 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c
> [0]PETSC ERROR: #9 SNESSolve() line 4567 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c
>
> I am 100% positive that the diagonal does not contain a zero entry, so my
> suspicions are either that this operation is not supported on the GPU at
> all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I
> am missing some setting to make this work on the GPU. Is this correct?
>
> Thanks!
>
> Cheers,
> H?ctor

From hecbarcab at gmail.com  Thu Oct 15 10:06:00 2020
From: hecbarcab at gmail.com (=?UTF-8?Q?H=C3=A9ctor_Barreiro_Cabrera?=)
Date: Thu, 15 Oct 2020 17:06:00 +0200
Subject: [petsc-users] Eisenstat-Walker method with GPU assembled
 matrices
In-Reply-To: <87r1pzbpoo.fsf@jedbrown.org>
References: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>
	<87r1pzbpoo.fsf@jedbrown.org>
Message-ID: <CAKHA1A2QF9L7Td8bTaLGuDNaWYKvJiwNWDKWHaZovxP5YtYYAg@mail.gmail.com>

Apologies, I realized that minutes after I hit the send button. My
understanding was that both needed to be user together, but from your email
it's clear that's not the case. After testing the solver without the
preconditioner, but with EW enabled within SNES, everything's working as
expected.

Thank you, and sorry for the noise!

El jue., 15 oct. 2020 a las 15:10, Jed Brown (<jed at jedbrown.org>) escribi?:

> H?ctor Barreiro Cabrera <hecbarcab at gmail.com> writes:
>
> > Hello fellow PETSc users,
> >
> > Following up my previous email
> > <
> https://lists.mcs.anl.gov/pipermail/petsc-users/2020-September/042511.html
> >,
> > I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA
> > kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry
> > Smith and Mark Adams!). However, I am now facing problems when trying to
> > use this matrix within a SNES solver with the Eisenstat-Walker method
> > enabled.
>
> Before going further, the error message makes this look like you're using
> -pc_type eisenstat (a "trick" to reduce the cost of Gauss-Seidel with
> Krylov) rather than -snes_ksp_ew (the Eisenstat-Walker method for tuning
> linear solver tolerances within SNES).  Is this what you intend?
>
> > According to PETSc's error log, the preconditioner is failing to invert
> the
> > matrix diagonal. Specifically it says that:
> > [0]PETSC ERROR: Arguments are incompatible
> > [0]PETSC ERROR: Zero diagonal on row 0
> > [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release
> > --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++
> > --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc"
> > --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single
> > --with-64-bit-indices=0 --with-single-library=1 --with-endian=little
> > --with-debugging=0 --with-x=0 --with-windows-graphics=0
> > --with-shared-libraries=1 --CUDAOPTFLAGS=-O2
> >
> > The stack trace leads to the diagonal inversion routine:
> > [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
> > [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
> > [0]PETSC ERROR: #3 MatSOR() line 4005 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c
> > [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c
> > [0]PETSC ERROR: #5 PCPreSolve() line 1549 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c
> > [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
> > [0]PETSC ERROR: #7 KSPSolve() line 889 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
> > [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c
> > [0]PETSC ERROR: #9 SNESSolve() line 4567 in
> > C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c
> >
> > I am 100% positive that the diagonal does not contain a zero entry, so my
> > suspicions are either that this operation is not supported on the GPU at
> > all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I
> > am missing some setting to make this work on the GPU. Is this correct?
> >
> > Thanks!
> >
> > Cheers,
> > H?ctor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201015/acca8355/attachment.html>

From bsmith at petsc.dev  Thu Oct 15 16:32:11 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 15 Oct 2020 16:32:11 -0500
Subject: [petsc-users] Eisenstat-Walker method with GPU assembled
 matrices
In-Reply-To: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>
References: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>
Message-ID: <D8FB2A4E-007D-4D79-B67C-D04608CE1407@petsc.dev>


  We still have the assumption the AIJ matrix always has a copy on the GPU.  How did you fill up the matrix on the GPU while not having its copy on the CPU?

  Barry

  When we remove this assumption we have to add a bunch more code for CPU only things to make sure they properly get the data from the GPU.


> On Oct 15, 2020, at 4:16 AM, H?ctor Barreiro Cabrera <hecbarcab at gmail.com> wrote:
> 
> Hello fellow PETSc users,
> 
> Following up my previous email <https://lists.mcs.anl.gov/pipermail/petsc-users/2020-September/042511.html>, I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry Smith and Mark Adams!). However, I am now facing problems when trying to use this matrix within a SNES solver with the Eisenstat-Walker method enabled.
> 
> According to PETSc's error log, the preconditioner is failing to invert the matrix diagonal. Specifically it says that:
> [0]PETSC ERROR: Arguments are incompatible
> [0]PETSC ERROR: Zero diagonal on row 0
> [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++ --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc" --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single --with-64-bit-indices=0 --with-single-library=1 --with-endian=little --with-debugging=0 --with-x=0 --with-windows-graphics=0 --with-shared-libraries=1 --CUDAOPTFLAGS=-O2
> 
> The stack trace leads to the diagonal inversion routine:
> [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
> [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
> [0]PETSC ERROR: #3 MatSOR() line 4005 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c
> [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c
> [0]PETSC ERROR: #5 PCPreSolve() line 1549 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c
> [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
> [0]PETSC ERROR: #7 KSPSolve() line 889 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c
> [0]PETSC ERROR: #9 SNESSolve() line 4567 in C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c
> 
> I am 100% positive that the diagonal does not contain a zero entry, so my suspicions are either that this operation is not supported on the GPU at all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I am missing some setting to make this work on the GPU. Is this correct?
> 
> Thanks!
> 
> Cheers,
> H?ctor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201015/3b4362e2/attachment.html>

From Eugenio.Aulisa at ttu.edu  Fri Oct 16 06:50:43 2020
From: Eugenio.Aulisa at ttu.edu (Aulisa, Eugenio)
Date: Fri, 16 Oct 2020 11:50:43 +0000
Subject: [petsc-users] MatDuplicate
Message-ID: <DM6PR06MB5435A5C5D56B5CC5820F050782030@DM6PR06MB5435.namprd06.prod.outlook.com>

Hi,

I have a MATMPIAIJ matrix A with an overestimated  preallocated size.

After closing it I want a duplicate of A with sharp memory
allocation for each row both diagonal and off-diagonal.

I know how to do it by hand, but I am wondering if a function already exists.

For example, if I use MatDuplicate with MAT_COPY_VALUES<https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDuplicateOption.html#MatDuplicateOption>,
will it do a sharp memory allocation, or use the same loose memory allocation of A?

Any other function would do the job?

Thanks,
Eugenio


Eugenio Aulisa

Department of Mathematics and Statistics,
Texas Tech University
Lubbock TX, 79409-1042
room: 226
http://www.math.ttu.edu/~eaulisa/
phone: (806) 834-6684
fax: (806) 742-1112


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201016/1aab636c/attachment.html>

From knepley at gmail.com  Fri Oct 16 07:27:17 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 16 Oct 2020 08:27:17 -0400
Subject: [petsc-users] MatDuplicate
In-Reply-To: <DM6PR06MB5435A5C5D56B5CC5820F050782030@DM6PR06MB5435.namprd06.prod.outlook.com>
References: <DM6PR06MB5435A5C5D56B5CC5820F050782030@DM6PR06MB5435.namprd06.prod.outlook.com>
Message-ID: <CAMYG4G=wwz17WbE1HSUdgJf6-LVev4BV+D_=OZi5DVgvD8VFBA@mail.gmail.com>

On Fri, Oct 16, 2020 at 7:50 AM Aulisa, Eugenio <Eugenio.Aulisa at ttu.edu>
wrote:

> Hi,
>
> I have a MATMPIAIJ matrix A with an overestimated  preallocated size.
>
> After closing it I want a duplicate of A with sharp memory
> allocation for each row both diagonal and off-diagonal.
>
> I know how to do it by hand, but I am wondering if a function already
> exists.
>
> For example, if I use MatDuplicate with MAT_COPY_VALUES
> <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDuplicateOption.html#MatDuplicateOption>,
>
> will it do a sharp memory allocation, or use the same loose memory
> allocation of A?
>

It is a sharp allocation.

  Thanks,

    Matt


> Any other function would do the job?
>
> Thanks,
> Eugenio
>
>
>
>
>
> Eugenio Aulisa
>
> Department of Mathematics and Statistics,
> Texas Tech University
> Lubbock TX, 79409-1042
> room: 226
> http://www.math.ttu.edu/~eaulisa/
> phone: (806) 834-6684
> fax: (806) 742-1112
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201016/31efe9da/attachment.html>

From Alexey.V.Kozlov.2 at nd.edu  Fri Oct 16 22:47:58 2020
From: Alexey.V.Kozlov.2 at nd.edu (Alexey Kozlov)
Date: Fri, 16 Oct 2020 23:47:58 -0400
Subject: [petsc-users] Preconditioner for Helmholtz-like problem
In-Reply-To: <CADOhEh5KRqwGMut7Avm34j41noWBjxAPeMNdV_u7wK9HN6AEhA@mail.gmail.com>
References: <CAMSzpn+cvji50kQkmn+9ng-d8HDRcyhEUiO5DafwYEy34CrdwQ@mail.gmail.com>
	<87o8m2tod8.fsf@jedbrown.org>
	<CAMSzpnJhFK8mUBKeykaG4sxinvHVJxWCwpZkMguNeTs_p6VzQw@mail.gmail.com>
	<A8918A9D-F0DF-4843-A896-FFF8FC87BA6E@petsc.dev>
	<CAMSzpn+BnEpezLM-hNRBbE3UdEu1mXHVHBnbJ0wVi_9g3M_0jw@mail.gmail.com>
	<CADOhEh5KRqwGMut7Avm34j41noWBjxAPeMNdV_u7wK9HN6AEhA@mail.gmail.com>
Message-ID: <CAMSzpnLK0o45GbzvU=QAHf-iWXLCbndJBn4xDKexX4TXrn8xRQ@mail.gmail.com>

Thank you for your advice! My sparse matrix seems to be very stiff so I
have decided to concentrate on the direct solvers. I have very good results
with MUMPS. Due to a lack of time I haven?t got a good result with
SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a
feeling that MUMPS is the best. I have run a sequential test case with
built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu
-ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings
and found that MUMPs was about 50 times faster than the built-in LU and
used about 3 times less RAM. Do you have any idea why it could be?

My test case has about 100,000 complex equations with about 3,000,000
non-zeros. PETSc was compiled with the following options: ./configure
--with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g
--with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
--with-scalar-type=complex --with-clanguage=c --with-openmp
--with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx
-axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2
-mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist
--download-mumps --download-scalapack --download-metis --download-cmake
--download-parmetis --download-ptscotch.

Running MUPMS in parallel using MPI also gave me a significant gain in
performance (about 10 times on a single cluster node).
Could you, please, advise me whether I can adjust some options for the
direct solvers to improve performance? Should I try MUMPS in OpenMP mode?

On Sat, Sep 19, 2020 at 7:40 AM Mark Adams <mfadams at lbl.gov> wrote:

> As Jed said high frequency is hard. AMG, as-is,  can be adapted (
> https://link.springer.com/article/10.1007/s00466-006-0047-8) with
> parameters.
> AMG for convection: use richardson/sor and not chebyshev smoothers and in
> smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0).
> Mark
>
> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
> wrote:
>
>> Thanks a lot! I'll check them out.
>>
>> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>   These are small enough that likely sparse direct solvers are the best
>>> use of your time and for general efficiency.
>>>
>>>   PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and
>>> Pastix. I recommend configuring PETSc for all three of them and then
>>> comparing them for problems of interest to you.
>>>
>>>    --download-superlu_dist --download-mumps --download-pastix
>>> --download-scalapack (used by MUMPS) --download-metis --download-parmetis
>>> --download-ptscotch
>>>
>>>   Barry
>>>
>>>
>>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
>>> wrote:
>>>
>>> Thanks for the tips! My matrix is complex and unsymmetric. My typical
>>> test case has of the order of one million equations. I use a 2nd-order
>>> finite-difference scheme with 19-point stencil, so my typical test case
>>> uses several GB of RAM.
>>>
>>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown <jed at jedbrown.org> wrote:
>>>
>>>> Unfortunately, those are hard problems in which the "good" methods are
>>>> technical and hard to make black-box.  There are "sweeping" methods that
>>>> solve on 2D "slabs" with PML boundary conditions, H-matrix based methods,
>>>> and fancy multigrid methods.  Attempting to solve with STRUMPACK is
>>>> probably the easiest thing to try (--download-strumpack).
>>>>
>>>>
>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html
>>>>
>>>> Is the matrix complex symmetric?
>>>>
>>>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D
>>>> problem like this if you have enough memory.  I'm assuming the memory or
>>>> time is unacceptable and you want an iterative method with much lower setup
>>>> costs.
>>>>
>>>> Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu> writes:
>>>>
>>>> > Dear all,
>>>> >
>>>> > I am solving a convected wave equation in a frequency domain. This
>>>> equation
>>>> > is a 3D Helmholtz equation with added first-order derivatives and
>>>> mixed
>>>> > derivatives, and with complex coefficients. The discretized PDE
>>>> results in
>>>> > a sparse linear system (about 10^6 equations) which is solved in
>>>> PETSc. I
>>>> > am having difficulty with the code convergence at high frequency,
>>>> skewed
>>>> > grid, and high Mach number. I suspect it may be due to the
>>>> preconditioner I
>>>> > use. I am currently using the ILU preconditioner with the number of
>>>> fill
>>>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the
>>>> art
>>>> > has evolved and there are better preconditioners for Helmholtz-like
>>>> > problems. Could you, please, advise me on a better preconditioner?
>>>> >
>>>> > Thanks,
>>>> > Alexey
>>>> >
>>>> > --
>>>> > Alexey V. Kozlov
>>>> >
>>>> > Research Scientist
>>>> > Department of Aerospace and Mechanical Engineering
>>>> > University of Notre Dame
>>>> >
>>>> > 117 Hessert Center
>>>> > Notre Dame, IN 46556-5684
>>>> > Phone: (574) 631-4335
>>>> > Fax: (574) 631-8355
>>>> > Email: akozlov at nd.edu
>>>>
>>>
>>>
>>> --
>>> Alexey V. Kozlov
>>>
>>> Research Scientist
>>> Department of Aerospace and Mechanical Engineering
>>> University of Notre Dame
>>>
>>> 117 Hessert Center
>>> Notre Dame, IN 46556-5684
>>> Phone: (574) 631-4335
>>> Fax: (574) 631-8355
>>> Email: akozlov at nd.edu
>>>
>>>
>>>
>>
>> --
>> Alexey V. Kozlov
>>
>> Research Scientist
>> Department of Aerospace and Mechanical Engineering
>> University of Notre Dame
>>
>> 117 Hessert Center
>> Notre Dame, IN 46556-5684
>> Phone: (574) 631-4335
>> Fax: (574) 631-8355
>> Email: akozlov at nd.edu
>>
>

-- 
Alexey V. Kozlov

Research Scientist
Department of Aerospace and Mechanical Engineering
University of Notre Dame

117 Hessert Center
Notre Dame, IN 46556-5684
Phone: (574) 631-4335
Fax: (574) 631-8355
Email: akozlov at nd.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201016/440e75d5/attachment.html>

From knepley at gmail.com  Fri Oct 16 23:33:07 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 17 Oct 2020 00:33:07 -0400
Subject: [petsc-users] Preconditioner for Helmholtz-like problem
In-Reply-To: <CAMSzpnLK0o45GbzvU=QAHf-iWXLCbndJBn4xDKexX4TXrn8xRQ@mail.gmail.com>
References: <CAMSzpn+cvji50kQkmn+9ng-d8HDRcyhEUiO5DafwYEy34CrdwQ@mail.gmail.com>
	<87o8m2tod8.fsf@jedbrown.org>
	<CAMSzpnJhFK8mUBKeykaG4sxinvHVJxWCwpZkMguNeTs_p6VzQw@mail.gmail.com>
	<A8918A9D-F0DF-4843-A896-FFF8FC87BA6E@petsc.dev>
	<CAMSzpn+BnEpezLM-hNRBbE3UdEu1mXHVHBnbJ0wVi_9g3M_0jw@mail.gmail.com>
	<CADOhEh5KRqwGMut7Avm34j41noWBjxAPeMNdV_u7wK9HN6AEhA@mail.gmail.com>
	<CAMSzpnLK0o45GbzvU=QAHf-iWXLCbndJBn4xDKexX4TXrn8xRQ@mail.gmail.com>
Message-ID: <CAMYG4GksMg4nj_Yd7kF==0YV2ramV-tRdDxk-LkmxoietX6n8w@mail.gmail.com>

On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
wrote:

> Thank you for your advice! My sparse matrix seems to be very stiff so I
> have decided to concentrate on the direct solvers. I have very good results
> with MUMPS. Due to a lack of time I haven?t got a good result with
> SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a
> feeling that MUMPS is the best. I have run a sequential test case with
> built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu
> -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings
> and found that MUMPs was about 50 times faster than the built-in LU and
> used about 3 times less RAM. Do you have any idea why it could be?
>
The numbers do not sound realistic, but of course we do not have your
particular problem. In particular, the memory figure seems impossible.

> My test case has about 100,000 complex equations with about 3,000,000
> non-zeros. PETSc was compiled with the following options: ./configure
> --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g
> --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
> --with-scalar-type=complex --with-clanguage=c --with-openmp
> --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx
> -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2
> -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist
> --download-mumps --download-scalapack --download-metis --download-cmake
> --download-parmetis --download-ptscotch.
>
> Running MUPMS in parallel using MPI also gave me a significant gain in
> performance (about 10 times on a single cluster node).
>
Again, this does not appear to make sense. The performance should be
limited by memory bandwidth, and a single cluster node will not usually have
10x the bandwidth of a CPU, although it might be possible with a very old
CPU.

It would help to understand the performance if you would send the output of
-log_view.

  Thanks,

    Matt

> Could you, please, advise me whether I can adjust some options for the
> direct solvers to improve performance? Should I try MUMPS in OpenMP mode?
>
> On Sat, Sep 19, 2020 at 7:40 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> As Jed said high frequency is hard. AMG, as-is,  can be adapted (
>> https://link.springer.com/article/10.1007/s00466-006-0047-8) with
>> parameters.
>> AMG for convection: use richardson/sor and not chebyshev smoothers and in
>> smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0).
>> Mark
>>
>> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
>> wrote:
>>
>>> Thanks a lot! I'll check them out.
>>>
>>> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>   These are small enough that likely sparse direct solvers are the best
>>>> use of your time and for general efficiency.
>>>>
>>>>   PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and
>>>> Pastix. I recommend configuring PETSc for all three of them and then
>>>> comparing them for problems of interest to you.
>>>>
>>>>    --download-superlu_dist --download-mumps --download-pastix
>>>> --download-scalapack (used by MUMPS) --download-metis --download-parmetis
>>>> --download-ptscotch
>>>>
>>>>   Barry
>>>>
>>>>
>>>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
>>>> wrote:
>>>>
>>>> Thanks for the tips! My matrix is complex and unsymmetric. My typical
>>>> test case has of the order of one million equations. I use a 2nd-order
>>>> finite-difference scheme with 19-point stencil, so my typical test case
>>>> uses several GB of RAM.
>>>>
>>>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown <jed at jedbrown.org> wrote:
>>>>
>>>>> Unfortunately, those are hard problems in which the "good" methods are
>>>>> technical and hard to make black-box.  There are "sweeping" methods that
>>>>> solve on 2D "slabs" with PML boundary conditions, H-matrix based methods,
>>>>> and fancy multigrid methods.  Attempting to solve with STRUMPACK is
>>>>> probably the easiest thing to try (--download-strumpack).
>>>>>
>>>>>
>>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html
>>>>>
>>>>> Is the matrix complex symmetric?
>>>>>
>>>>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a
>>>>> 3D problem like this if you have enough memory.  I'm assuming the memory or
>>>>> time is unacceptable and you want an iterative method with much lower setup
>>>>> costs.
>>>>>
>>>>> Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu> writes:
>>>>>
>>>>> > Dear all,
>>>>> >
>>>>> > I am solving a convected wave equation in a frequency domain. This
>>>>> equation
>>>>> > is a 3D Helmholtz equation with added first-order derivatives and
>>>>> mixed
>>>>> > derivatives, and with complex coefficients. The discretized PDE
>>>>> results in
>>>>> > a sparse linear system (about 10^6 equations) which is solved in
>>>>> PETSc. I
>>>>> > am having difficulty with the code convergence at high frequency,
>>>>> skewed
>>>>> > grid, and high Mach number. I suspect it may be due to the
>>>>> preconditioner I
>>>>> > use. I am currently using the ILU preconditioner with the number of
>>>>> fill
>>>>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the
>>>>> art
>>>>> > has evolved and there are better preconditioners for Helmholtz-like
>>>>> > problems. Could you, please, advise me on a better preconditioner?
>>>>> >
>>>>> > Thanks,
>>>>> > Alexey
>>>>> >
>>>>> > --
>>>>> > Alexey V. Kozlov
>>>>> >
>>>>> > Research Scientist
>>>>> > Department of Aerospace and Mechanical Engineering
>>>>> > University of Notre Dame
>>>>> >
>>>>> > 117 Hessert Center
>>>>> > Notre Dame, IN 46556-5684
>>>>> > Phone: (574) 631-4335
>>>>> > Fax: (574) 631-8355
>>>>> > Email: akozlov at nd.edu
>>>>>
>>>>
>>>>
>>>> --
>>>> Alexey V. Kozlov
>>>>
>>>> Research Scientist
>>>> Department of Aerospace and Mechanical Engineering
>>>> University of Notre Dame
>>>>
>>>> 117 Hessert Center
>>>> Notre Dame, IN 46556-5684
>>>> Phone: (574) 631-4335
>>>> Fax: (574) 631-8355
>>>> Email: akozlov at nd.edu
>>>>
>>>>
>>>>
>>>
>>> --
>>> Alexey V. Kozlov
>>>
>>> Research Scientist
>>> Department of Aerospace and Mechanical Engineering
>>> University of Notre Dame
>>>
>>> 117 Hessert Center
>>> Notre Dame, IN 46556-5684
>>> Phone: (574) 631-4335
>>> Fax: (574) 631-8355
>>> Email: akozlov at nd.edu
>>>
>>
>
> --
> Alexey V. Kozlov
>
> Research Scientist
> Department of Aerospace and Mechanical Engineering
> University of Notre Dame
>
> 117 Hessert Center
> Notre Dame, IN 46556-5684
> Phone: (574) 631-4335
> Fax: (574) 631-8355
> Email: akozlov at nd.edu
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201017/8f6a3fe4/attachment-0001.html>

From bsmith at petsc.dev  Fri Oct 16 23:42:22 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 16 Oct 2020 23:42:22 -0500
Subject: [petsc-users] Preconditioner for Helmholtz-like problem
In-Reply-To: <CAMYG4GksMg4nj_Yd7kF==0YV2ramV-tRdDxk-LkmxoietX6n8w@mail.gmail.com>
References: <CAMSzpn+cvji50kQkmn+9ng-d8HDRcyhEUiO5DafwYEy34CrdwQ@mail.gmail.com>
	<87o8m2tod8.fsf@jedbrown.org>
	<CAMSzpnJhFK8mUBKeykaG4sxinvHVJxWCwpZkMguNeTs_p6VzQw@mail.gmail.com>
	<A8918A9D-F0DF-4843-A896-FFF8FC87BA6E@petsc.dev>
	<CAMSzpn+BnEpezLM-hNRBbE3UdEu1mXHVHBnbJ0wVi_9g3M_0jw@mail.gmail.com>
	<CADOhEh5KRqwGMut7Avm34j41noWBjxAPeMNdV_u7wK9HN6AEhA@mail.gmail.com>
	<CAMSzpnLK0o45GbzvU=QAHf-iWXLCbndJBn4xDKexX4TXrn8xRQ@mail.gmail.com>
	<CAMYG4GksMg4nj_Yd7kF==0YV2ramV-tRdDxk-LkmxoietX6n8w@mail.gmail.com>
Message-ID: <19C23646-4F73-461A-9D9E-A25FAA99E96A@petsc.dev>


> On Oct 16, 2020, at 11:33 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu <mailto:Alexey.V.Kozlov.2 at nd.edu>> wrote:
> Thank you for your advice! My sparse matrix seems to be very stiff so I have decided to concentrate on the direct solvers. I have very good results with MUMPS. Due to a lack of time I haven?t got a good result with SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a feeling that MUMPS is the best. I have run a sequential test case with built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings and found that MUMPs was about 50 times faster than the built-in LU and used about 3 times less RAM. Do you have any idea why it could be?
> 
> The numbers do not sound realistic, but of course we do not have your particular problem. In particular, the memory figure seems impossible. 

   They are probably using a different ordering. Remember each external direct solver manages its own orderings and doesn't share even their names. (Not nice community behavior).

> 
> My test case has about 100,000 complex equations with about 3,000,000 non-zeros. PETSc was compiled with the following options: ./configure --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch.
> 
> Running MUPMS in parallel using MPI also gave me a significant gain in performance (about 10 times on a single cluster node).
> 
> Again, this does not appear to make sense. The performance should be limited by memory bandwidth, and a single cluster node will not usually have
> 10x the bandwidth of a CPU, although it might be possible with a very old CPU.
> 
> It would help to understand the performance if you would send the output of -log_view.
> 
>   Thanks,
> 
>     Matt
> 
> Could you, please, advise me whether I can adjust some options for the direct solvers to improve performance? Should I try MUMPS in OpenMP mode?

    Look at the orderings and other options that MUMPs supports and try them out. Most can be accessed from the command line. You can run with -help to get a real brief summary of them but should read the MUMPs users manual for full details.


> 
> On Sat, Sep 19, 2020 at 7:40 AM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> As Jed said high frequency is hard. AMG, as-is,  can be adapted (https://link.springer.com/article/10.1007/s00466-006-0047-8 <https://link.springer.com/article/10.1007/s00466-006-0047-8>) with parameters.
> AMG for convection: use richardson/sor and not chebyshev smoothers and in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0).
> Mark
> 
> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu <mailto:Alexey.V.Kozlov.2 at nd.edu>> wrote:
> Thanks a lot! I'll check them out.
> 
> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
>   These are small enough that likely sparse direct solvers are the best use of your time and for general efficiency. 
> 
>   PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and Pastix. I recommend configuring PETSc for all three of them and then comparing them for problems of interest to you.
> 
>    --download-superlu_dist --download-mumps --download-pastix --download-scalapack (used by MUMPS) --download-metis --download-parmetis --download-ptscotch 
> 
>   Barry
> 
> 
>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu <mailto:Alexey.V.Kozlov.2 at nd.edu>> wrote:
>> 
>> Thanks for the tips! My matrix is complex and unsymmetric. My typical test case has of the order of one million equations. I use a 2nd-order finite-difference scheme with 19-point stencil, so my typical test case uses several GB of RAM.
>> 
>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>> Unfortunately, those are hard problems in which the "good" methods are technical and hard to make black-box.  There are "sweeping" methods that solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, and fancy multigrid methods.  Attempting to solve with STRUMPACK is probably the easiest thing to try (--download-strumpack).
>> 
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html>
>> 
>> Is the matrix complex symmetric?
>> 
>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D problem like this if you have enough memory.  I'm assuming the memory or time is unacceptable and you want an iterative method with much lower setup costs.
>> 
>> Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu <mailto:Alexey.V.Kozlov.2 at nd.edu>> writes:
>> 
>> > Dear all,
>> >
>> > I am solving a convected wave equation in a frequency domain. This equation
>> > is a 3D Helmholtz equation with added first-order derivatives and mixed
>> > derivatives, and with complex coefficients. The discretized PDE results in
>> > a sparse linear system (about 10^6 equations) which is solved in PETSc. I
>> > am having difficulty with the code convergence at high frequency, skewed
>> > grid, and high Mach number. I suspect it may be due to the preconditioner I
>> > use. I am currently using the ILU preconditioner with the number of fill
>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art
>> > has evolved and there are better preconditioners for Helmholtz-like
>> > problems. Could you, please, advise me on a better preconditioner?
>> >
>> > Thanks,
>> > Alexey
>> >
>> > -- 
>> > Alexey V. Kozlov
>> >
>> > Research Scientist
>> > Department of Aerospace and Mechanical Engineering
>> > University of Notre Dame
>> >
>> > 117 Hessert Center
>> > Notre Dame, IN 46556-5684
>> > Phone: (574) 631-4335
>> > Fax: (574) 631-8355
>> > Email: akozlov at nd.edu <mailto:akozlov at nd.edu>
>> 
>> 
>> -- 
>> Alexey V. Kozlov
>> 
>> Research Scientist
>> Department of Aerospace and Mechanical Engineering
>> University of Notre Dame
>> 
>> 117 Hessert Center
>> Notre Dame, IN 46556-5684
>> Phone: (574) 631-4335
>> Fax: (574) 631-8355
>> Email: akozlov at nd.edu <mailto:akozlov at nd.edu>
> 
> 
> 
> -- 
> Alexey V. Kozlov
> 
> Research Scientist
> Department of Aerospace and Mechanical Engineering
> University of Notre Dame
> 
> 117 Hessert Center
> Notre Dame, IN 46556-5684
> Phone: (574) 631-4335
> Fax: (574) 631-8355
> Email: akozlov at nd.edu <mailto:akozlov at nd.edu>
> 
> 
> -- 
> Alexey V. Kozlov
> 
> Research Scientist
> Department of Aerospace and Mechanical Engineering
> University of Notre Dame
> 
> 117 Hessert Center
> Notre Dame, IN 46556-5684
> Phone: (574) 631-4335
> Fax: (574) 631-8355
> Email: akozlov at nd.edu <mailto:akozlov at nd.edu>
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201016/8704bbdb/attachment.html>

From pierre at joliv.et  Sat Oct 17 01:52:43 2020
From: pierre at joliv.et (Pierre Jolivet)
Date: Sat, 17 Oct 2020 08:52:43 +0200
Subject: [petsc-users] Preconditioner for Helmholtz-like problem
In-Reply-To: <CAMSzpnLK0o45GbzvU=QAHf-iWXLCbndJBn4xDKexX4TXrn8xRQ@mail.gmail.com>
References: <CAMSzpn+cvji50kQkmn+9ng-d8HDRcyhEUiO5DafwYEy34CrdwQ@mail.gmail.com>
	<87o8m2tod8.fsf@jedbrown.org>
	<CAMSzpnJhFK8mUBKeykaG4sxinvHVJxWCwpZkMguNeTs_p6VzQw@mail.gmail.com>
	<A8918A9D-F0DF-4843-A896-FFF8FC87BA6E@petsc.dev>
	<CAMSzpn+BnEpezLM-hNRBbE3UdEu1mXHVHBnbJ0wVi_9g3M_0jw@mail.gmail.com>
	<CADOhEh5KRqwGMut7Avm34j41noWBjxAPeMNdV_u7wK9HN6AEhA@mail.gmail.com>
	<CAMSzpnLK0o45GbzvU=QAHf-iWXLCbndJBn4xDKexX4TXrn8xRQ@mail.gmail.com>
Message-ID: <F6A4FDD9-F49E-4B8D-837E-1662FD9B0D8D@joliv.et>


> On 17 Oct 2020, at 5:47 AM, Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu> wrote:
> 
> Thank you for your advice! My sparse matrix seems to be very stiff so I have decided to concentrate on the direct solvers. I have very good results with MUMPS. Due to a lack of time I haven?t got a good result with SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a feeling that MUMPS is the best. I have run a sequential test case with built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings and found that MUMPs was about 50 times faster than the built-in LU and used about 3 times less RAM. Do you have any idea why it could be?
> 
> My test case has about 100,000 complex equations with about 3,000,000 non-zeros. PETSc was compiled with the following options: ./configure --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi --with-scalar-type=complex --with-clanguage=c --with-openmp --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist --download-mumps --download-scalapack --download-metis --download-cmake --download-parmetis --download-ptscotch.
> 
> Running MUPMS in parallel using MPI also gave me a significant gain in performance (about 10 times on a single cluster node).
> 
> Could you, please, advise me whether I can adjust some options for the direct solvers to improve performance?

Your problem may be too small, but if you stick to full MUMPS, it may be worth playing around with the block low-rank (BLR) options.
Here are some references: http://mumps.enseeiht.fr/doc/Thesis_TheoMary.pdf#page=191 <http://mumps.enseeiht.fr/doc/Thesis_TheoMary.pdf#page=191> http://mumps.enseeiht.fr/doc/ud_2017/Shantsev_Talk.pdf <http://mumps.enseeiht.fr/doc/ud_2017/Shantsev_Talk.pdf> 
The relevant options in PETSc are -mat_mumps_icntl_35, -mat_mumps_icntl_36, and -mat_mumps_cntl_7

Thanks,
Pierre

> Should I try MUMPS in OpenMP mode?
> 
> On Sat, Sep 19, 2020 at 7:40 AM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> As Jed said high frequency is hard. AMG, as-is,  can be adapted (https://link.springer.com/article/10.1007/s00466-006-0047-8 <https://link.springer.com/article/10.1007/s00466-006-0047-8>) with parameters.
> AMG for convection: use richardson/sor and not chebyshev smoothers and in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0).
> Mark
> 
> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu <mailto:Alexey.V.Kozlov.2 at nd.edu>> wrote:
> Thanks a lot! I'll check them out.
> 
> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
>   These are small enough that likely sparse direct solvers are the best use of your time and for general efficiency. 
> 
>   PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and Pastix. I recommend configuring PETSc for all three of them and then comparing them for problems of interest to you.
> 
>    --download-superlu_dist --download-mumps --download-pastix --download-scalapack (used by MUMPS) --download-metis --download-parmetis --download-ptscotch 
> 
>   Barry
> 
> 
>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu <mailto:Alexey.V.Kozlov.2 at nd.edu>> wrote:
>> 
>> Thanks for the tips! My matrix is complex and unsymmetric. My typical test case has of the order of one million equations. I use a 2nd-order finite-difference scheme with 19-point stencil, so my typical test case uses several GB of RAM.
>> 
>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>> Unfortunately, those are hard problems in which the "good" methods are technical and hard to make black-box.  There are "sweeping" methods that solve on 2D "slabs" with PML boundary conditions, H-matrix based methods, and fancy multigrid methods.  Attempting to solve with STRUMPACK is probably the easiest thing to try (--download-strumpack).
>> 
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html>
>> 
>> Is the matrix complex symmetric?
>> 
>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a 3D problem like this if you have enough memory.  I'm assuming the memory or time is unacceptable and you want an iterative method with much lower setup costs.
>> 
>> Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu <mailto:Alexey.V.Kozlov.2 at nd.edu>> writes:
>> 
>> > Dear all,
>> >
>> > I am solving a convected wave equation in a frequency domain. This equation
>> > is a 3D Helmholtz equation with added first-order derivatives and mixed
>> > derivatives, and with complex coefficients. The discretized PDE results in
>> > a sparse linear system (about 10^6 equations) which is solved in PETSc. I
>> > am having difficulty with the code convergence at high frequency, skewed
>> > grid, and high Mach number. I suspect it may be due to the preconditioner I
>> > use. I am currently using the ILU preconditioner with the number of fill
>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of the art
>> > has evolved and there are better preconditioners for Helmholtz-like
>> > problems. Could you, please, advise me on a better preconditioner?
>> >
>> > Thanks,
>> > Alexey
>> >
>> > -- 
>> > Alexey V. Kozlov
>> >
>> > Research Scientist
>> > Department of Aerospace and Mechanical Engineering
>> > University of Notre Dame
>> >
>> > 117 Hessert Center
>> > Notre Dame, IN 46556-5684
>> > Phone: (574) 631-4335
>> > Fax: (574) 631-8355
>> > Email: akozlov at nd.edu <mailto:akozlov at nd.edu>
>> 
>> 
>> -- 
>> Alexey V. Kozlov
>> 
>> Research Scientist
>> Department of Aerospace and Mechanical Engineering
>> University of Notre Dame
>> 
>> 117 Hessert Center
>> Notre Dame, IN 46556-5684
>> Phone: (574) 631-4335
>> Fax: (574) 631-8355
>> Email: akozlov at nd.edu <mailto:akozlov at nd.edu>
> 
> 
> 
> -- 
> Alexey V. Kozlov
> 
> Research Scientist
> Department of Aerospace and Mechanical Engineering
> University of Notre Dame
> 
> 117 Hessert Center
> Notre Dame, IN 46556-5684
> Phone: (574) 631-4335
> Fax: (574) 631-8355
> Email: akozlov at nd.edu <mailto:akozlov at nd.edu>
> 
> 
> -- 
> Alexey V. Kozlov
> 
> Research Scientist
> Department of Aerospace and Mechanical Engineering
> University of Notre Dame
> 
> 117 Hessert Center
> Notre Dame, IN 46556-5684
> Phone: (574) 631-4335
> Fax: (574) 631-8355
> Email: akozlov at nd.edu <mailto:akozlov at nd.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201017/9be9ae26/attachment-0001.html>

From Alexey.V.Kozlov.2 at nd.edu  Sat Oct 17 04:20:32 2020
From: Alexey.V.Kozlov.2 at nd.edu (Alexey Kozlov)
Date: Sat, 17 Oct 2020 05:20:32 -0400
Subject: [petsc-users] Preconditioner for Helmholtz-like problem
In-Reply-To: <CAMYG4GksMg4nj_Yd7kF==0YV2ramV-tRdDxk-LkmxoietX6n8w@mail.gmail.com>
References: <CAMSzpn+cvji50kQkmn+9ng-d8HDRcyhEUiO5DafwYEy34CrdwQ@mail.gmail.com>
	<87o8m2tod8.fsf@jedbrown.org>
	<CAMSzpnJhFK8mUBKeykaG4sxinvHVJxWCwpZkMguNeTs_p6VzQw@mail.gmail.com>
	<A8918A9D-F0DF-4843-A896-FFF8FC87BA6E@petsc.dev>
	<CAMSzpn+BnEpezLM-hNRBbE3UdEu1mXHVHBnbJ0wVi_9g3M_0jw@mail.gmail.com>
	<CADOhEh5KRqwGMut7Avm34j41noWBjxAPeMNdV_u7wK9HN6AEhA@mail.gmail.com>
	<CAMSzpnLK0o45GbzvU=QAHf-iWXLCbndJBn4xDKexX4TXrn8xRQ@mail.gmail.com>
	<CAMYG4GksMg4nj_Yd7kF==0YV2ramV-tRdDxk-LkmxoietX6n8w@mail.gmail.com>
Message-ID: <CAMSzpnL1woS6-zW3Ki8W_S0+JHrfbZXGKQcchLXQUpjE5SBmMg@mail.gmail.com>

Matt,

Thank you for your reply!
My system has 8 NUMA nodes, so the memory bandwidth can increase up to 8
times when doing parallel computations. In other words, each node of the
big computer cluster works as a small cluster consisting of 8 nodes. Of
course, this works only if the contribution of communications between the
NUMA nodes is small. The total amount of memory on a single cluster node is
128GB, so it is enough to fit my application.

Below is the output of -log_view for three cases:
(1) BUILT-IN PETSC LU SOLVER
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./caat on a arch-linux-c-opt named d24cepyc110.crc.nd.edu with 1 processor,
by akozlov Sat Oct 17 03:58:23 2020
Using 0 OpenMP threads
Using Petsc Release Version 3.13.6, unknown

                         Max       Max/Min     Avg       Total
Time (sec):           5.551e+03     1.000   5.551e+03
Objects:              1.000e+01     1.000   1.000e+01
Flop:                 1.255e+13     1.000   1.255e+13  1.255e+13
Flop/sec:             2.261e+09     1.000   2.261e+09  2.261e+09
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flop
                            and VecAXPY() for complex vectors of length N
--> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---
 -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total
    Avg         %Total    Count   %Total
 0:      Main Stage: 5.5509e+03 100.0%  1.2551e+13 100.0%  0.000e+00   0.0%
 0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop
     --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
 Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatSolve               1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  6246
MatLUFactorSym         1 1.0 1.0673e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 5.5350e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00
0.0e+00100100  0  0  0 100100  0  0  0  2267
MatAssemblyBegin       1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 1.0247e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 1.4306e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2596e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 4 1.0 9.3985e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin       2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 7.3284e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  6245
PCSetUp                1 1.0 5.5458e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00
0.0e+00100100  0  0  0 100100  0  0  0  2262
PCApply                1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  6246
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2  11501999992     0.
              Vector     2              2      3761520     0.
       Krylov Solver     1              1         1408     0.
      Preconditioner     1              1         1184     0.
           Index Set     3              3      1412088     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 7.15256e-08
#PETSc Option Table entries:
-ksp_type preonly
-log_view
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 16 sizeof(PetscInt) 4
Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl
--with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
--with-scalar-type=complex --with-clanguage=c --with-openmp
--with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx
-axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2
-mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist
--download-mumps --download-scalapack --download-metis --download-cmake
--download-parmetis --download-ptscotch
-----------------------------------------
Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu
Machine characteristics:
Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo
Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc
Using PETSc arch: arch-linux-c-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2 -fopenmp
Using Fortran compiler: mpif90  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2  -fopenmp
-----------------------------------------

Using include paths: -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/include
-I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include
-I/opt/crc/v/valgrind/3.14/ompi/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/afs/
crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc
-Wl,-rpath,/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
-L/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl
-Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
-L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
-Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
-L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64
-L/opt/crc/i/intel/19.0/mkl/lib/intel64
-Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64
-L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64
-L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/
crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
-L/afs/
crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
-L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps
-lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64
-lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis
-lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis
-lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread
-lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc
-lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl
-----------------------------------------


(2) EXTERNAL PACKAGE MUMPS, 1 MPI PROCESS
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./caat on a arch-linux-c-opt named d24cepyc068.crc.nd.edu with 1 processor,
by akozlov Sat Oct 17 01:55:20 2020
Using 0 OpenMP threads
Using Petsc Release Version 3.13.6, unknown

                         Max       Max/Min     Avg       Total
Time (sec):           1.075e+02     1.000   1.075e+02
Objects:              9.000e+00     1.000   9.000e+00
Flop:                 1.959e+12     1.000   1.959e+12  1.959e+12
Flop/sec:             1.823e+10     1.000   1.823e+10  1.823e+10
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flop
                            and VecAXPY() for complex vectors of length N
--> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---
 -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total
    Avg         %Total    Count   %Total
 0:      Main Stage: 1.0747e+02 100.0%  1.9594e+12 100.0%  0.000e+00   0.0%
 0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop
     --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
 Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatSolve               1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00
0.0e+00  0100  0  0  0   0100  0  0  0 6126201
MatLUFactorSym         1 1.0 2.3141e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatLUFactorNum         1 1.0 1.0001e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00
0.0e+00 93  0  0  0  0  93  0  0  0  0    12
MatAssemblyBegin       1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 1.0067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 1.8650e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.3029e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecCopy                1 1.0 1.0943e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 4 1.0 9.2626e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin       2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.1981e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00
0.0e+00  0100  0  0  0   0100  0  0  0 6123146
PCSetUp                1 1.0 1.0251e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00
0.0e+00 95  0  0  0  0  95  0  0  0  0    11
PCApply                1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00
0.0e+00  0100  0  0  0   0100  0  0  0 6126096
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2     59441612     0.
              Vector     2              2      3761520     0.
       Krylov Solver     1              1         1408     0.
      Preconditioner     1              1         1184     0.
           Index Set     2              2       941392     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-ksp_type preonly
-log_view
-pc_factor_mat_solver_type mumps
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 16 sizeof(PetscInt) 4
Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl
--with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
--with-scalar-type=complex --with-clanguage=c --with-openmp
--with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx
-axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2
-mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist
--download-mumps --download-scalapack --download-metis --download-cmake
--download-parmetis --download-ptscotch
-----------------------------------------
Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu
Machine characteristics:
Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo
Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc
Using PETSc arch: arch-linux-c-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2 -fopenmp
Using Fortran compiler: mpif90  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2  -fopenmp
-----------------------------------------

Using include paths: -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/include
-I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include
-I/opt/crc/v/valgrind/3.14/ompi/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/afs/
crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc
-Wl,-rpath,/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
-L/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl
-Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
-L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
-Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
-L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64
-L/opt/crc/i/intel/19.0/mkl/lib/intel64
-Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64
-L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64
-L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/
crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
-L/afs/
crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
-L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps
-lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64
-lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis
-lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis
-lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread
-lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc
-lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl
-----------------------------------------


(3) EXTERNAL PACKAGE MUMPS , 48 MPI PROCESSES ON A SINGLE CLUSTER NODE WITH
8 NUMA NODES
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./caat on a arch-linux-c-opt named d24cepyc069.crc.nd.edu with 48
processors, by akozlov Sat Oct 17 04:40:25 2020
Using 0 OpenMP threads
Using Petsc Release Version 3.13.6, unknown

                         Max       Max/Min     Avg       Total
Time (sec):           1.415e+01     1.000   1.415e+01
Objects:              3.000e+01     1.000   3.000e+01
Flop:                 4.855e+10     1.637   4.084e+10  1.960e+12
Flop/sec:             3.431e+09     1.637   2.886e+09  1.385e+11
MPI Messages:         1.180e+02     2.682   8.169e+01  3.921e+03
MPI Message Lengths:  1.559e+05     5.589   1.238e+03  4.855e+06
MPI Reductions:       4.000e+01     1.000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flop
                            and VecAXPY() for complex vectors of length N
--> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---
 -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total
    Avg         %Total    Count   %Total
 0:      Main Stage: 1.4150e+01 100.0%  1.9602e+12 100.0%  3.921e+03 100.0%
 1.238e+03      100.0%  3.100e+01  77.5%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop
     --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
 Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          5 1.0 1.0707e-02 3.3 0.00e+00 0.0 7.8e+02 4.0e+00
5.0e+00  0  0 20  0 12   0  0 20  0 16     0
BuildTwoSidedF         3 1.0 8.6837e-03 7.8 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+00  0  0  0  0  8   0  0  0  0 10     0
MatSolve               1 1.0 6.6314e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03
6.0e+00  0100 90 87 15   0100 90 87 19 29529617
MatLUFactorSym         1 1.0 2.4322e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 17  0  0  0 10  17  0  0  0 13     0
MatLUFactorNum         1 1.0 5.8816e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00
0.0e+00 42  0  0  0  0  42  0  0  0  0   332
MatAssemblyBegin       1 1.0 7.3917e-0357.6 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00  0  0  0  0  2   0  0  0  0  3     0
MatAssemblyEnd         1 1.0 2.5823e-02 1.0 0.00e+00 0.0 3.8e+02 1.6e+03
5.0e+00  0  0 10 13 12   0  0 10 13 16     0
MatGetRowIJ            1 1.0 3.5763e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 9.2506e-05 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 4 1.0 5.3000e-0460.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin       2 1.0 2.2390e-0319.1 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  5   0  0  0  0  6     0
VecAssemblyEnd         2 1.0 9.7752e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin        2 1.0 1.6036e-0312.8 0.00e+00 0.0 5.9e+02 4.8e+03
1.0e+00  0  0 15 58  2   0  0 15 58  3     0
VecScatterEnd          2 1.0 2.0087e-0338.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             2 1.0 1.5259e-05 5.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                3 1.0 3.3023e-03 2.9 0.00e+00 0.0 1.6e+03 7.0e+02
2.0e+00  0  0 40 23  5   0  0 40 23  6     0
SFBcastOpBegin         2 1.0 1.5953e-0313.7 0.00e+00 0.0 5.9e+02 4.8e+03
1.0e+00  0  0 15 58  2   0  0 15 58  3     0
SFBcastOpEnd           2 1.0 2.0008e-0345.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFPack                 2 1.0 1.4646e-03361.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack               2 1.0 4.1723e-0529.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 6.6350e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03
6.0e+00  0100 90 87 15   0100 90 87 19 29513594
PCSetUp                1 1.0 8.4679e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00
1.0e+01 60  0  0  0 25  60  0  0  0 32   230
PCApply                1 1.0 6.6319e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03
6.0e+00  0100 90 87 15   0100 90 87 19 29527282
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     4              4      1224428     0.
         Vec Scatter     3              3         2400     0.
              Vector     8              8      1923424     0.
           Index Set     9              9        32392     0.
   Star Forest Graph     3              3         3376     0.
       Krylov Solver     1              1         1408     0.
      Preconditioner     1              1         1160     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 7.15256e-08
Average time for MPI_Barrier(): 3.48091e-06
Average time for zero size MPI_Send(): 2.49843e-06
#PETSc Option Table entries:
-ksp_type preonly
-log_view
-pc_factor_mat_solver_type mumps
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 16 sizeof(PetscInt) 4
Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl
--with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
--with-scalar-type=complex --with-clanguage=c --with-openmp
--with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx
-axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2
-mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist
--download-mumps --download-scalapack --download-metis --download-cmake
--download-parmetis --download-ptscotch
-----------------------------------------
Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu
Machine characteristics:
Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo
Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc
Using PETSc arch: arch-linux-c-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2 -fopenmp
Using Fortran compiler: mpif90  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
-no-prec-div -fp-model fast=2  -fopenmp
-----------------------------------------

Using include paths: -I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/include
-I/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include
-I/opt/crc/v/valgrind/3.14/ompi/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/afs/
crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc
-Wl,-rpath,/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
-L/afs/crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl
-Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
-L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
-Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
-L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64
-L/opt/crc/i/intel/19.0/mkl/lib/intel64
-Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64
-L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64
-L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/
crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
-L/afs/
crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
-Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
-L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps
-lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64
-lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis
-lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis
-lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread
-lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc
-lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl
-----------------------------------------


On Sat, Oct 17, 2020 at 12:33 AM Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
> wrote:
>
>> Thank you for your advice! My sparse matrix seems to be very stiff so I
>> have decided to concentrate on the direct solvers. I have very good results
>> with MUMPS. Due to a lack of time I haven?t got a good result with
>> SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a
>> feeling that MUMPS is the best. I have run a sequential test case with
>> built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu
>> -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings
>> and found that MUMPs was about 50 times faster than the built-in LU and
>> used about 3 times less RAM. Do you have any idea why it could be?
>>
> The numbers do not sound realistic, but of course we do not have your
> particular problem. In particular, the memory figure seems impossible.
>
>> My test case has about 100,000 complex equations with about 3,000,000
>> non-zeros. PETSc was compiled with the following options: ./configure
>> --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g
>> --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
>> --with-scalar-type=complex --with-clanguage=c --with-openmp
>> --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2
>> -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx
>> -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2
>> -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist
>> --download-mumps --download-scalapack --download-metis --download-cmake
>> --download-parmetis --download-ptscotch.
>>
>> Running MUPMS in parallel using MPI also gave me a significant gain in
>> performance (about 10 times on a single cluster node).
>>
> Again, this does not appear to make sense. The performance should be
> limited by memory bandwidth, and a single cluster node will not usually have
> 10x the bandwidth of a CPU, although it might be possible with a very old
> CPU.
>
> It would help to understand the performance if you would send the output
> of -log_view.
>
>   Thanks,
>
>     Matt
>
>> Could you, please, advise me whether I can adjust some options for the
>> direct solvers to improve performance? Should I try MUMPS in OpenMP mode?
>>
>> On Sat, Sep 19, 2020 at 7:40 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> As Jed said high frequency is hard. AMG, as-is,  can be adapted (
>>> https://link.springer.com/article/10.1007/s00466-006-0047-8) with
>>> parameters.
>>> AMG for convection: use richardson/sor and not chebyshev smoothers and
>>> in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0).
>>> Mark
>>>
>>> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
>>> wrote:
>>>
>>>> Thanks a lot! I'll check them out.
>>>>
>>>> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>>
>>>>>   These are small enough that likely sparse direct solvers are the
>>>>> best use of your time and for general efficiency.
>>>>>
>>>>>   PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and
>>>>> Pastix. I recommend configuring PETSc for all three of them and then
>>>>> comparing them for problems of interest to you.
>>>>>
>>>>>    --download-superlu_dist --download-mumps --download-pastix
>>>>> --download-scalapack (used by MUMPS) --download-metis --download-parmetis
>>>>> --download-ptscotch
>>>>>
>>>>>   Barry
>>>>>
>>>>>
>>>>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
>>>>> wrote:
>>>>>
>>>>> Thanks for the tips! My matrix is complex and unsymmetric. My typical
>>>>> test case has of the order of one million equations. I use a 2nd-order
>>>>> finite-difference scheme with 19-point stencil, so my typical test case
>>>>> uses several GB of RAM.
>>>>>
>>>>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown <jed at jedbrown.org> wrote:
>>>>>
>>>>>> Unfortunately, those are hard problems in which the "good" methods
>>>>>> are technical and hard to make black-box.  There are "sweeping" methods
>>>>>> that solve on 2D "slabs" with PML boundary conditions, H-matrix based
>>>>>> methods, and fancy multigrid methods.  Attempting to solve with STRUMPACK
>>>>>> is probably the easiest thing to try (--download-strumpack).
>>>>>>
>>>>>>
>>>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html
>>>>>>
>>>>>> Is the matrix complex symmetric?
>>>>>>
>>>>>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a
>>>>>> 3D problem like this if you have enough memory.  I'm assuming the memory or
>>>>>> time is unacceptable and you want an iterative method with much lower setup
>>>>>> costs.
>>>>>>
>>>>>> Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu> writes:
>>>>>>
>>>>>> > Dear all,
>>>>>> >
>>>>>> > I am solving a convected wave equation in a frequency domain. This
>>>>>> equation
>>>>>> > is a 3D Helmholtz equation with added first-order derivatives and
>>>>>> mixed
>>>>>> > derivatives, and with complex coefficients. The discretized PDE
>>>>>> results in
>>>>>> > a sparse linear system (about 10^6 equations) which is solved in
>>>>>> PETSc. I
>>>>>> > am having difficulty with the code convergence at high frequency,
>>>>>> skewed
>>>>>> > grid, and high Mach number. I suspect it may be due to the
>>>>>> preconditioner I
>>>>>> > use. I am currently using the ILU preconditioner with the number of
>>>>>> fill
>>>>>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of
>>>>>> the art
>>>>>> > has evolved and there are better preconditioners for Helmholtz-like
>>>>>> > problems. Could you, please, advise me on a better preconditioner?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Alexey
>>>>>> >
>>>>>> > --
>>>>>> > Alexey V. Kozlov
>>>>>> >
>>>>>> > Research Scientist
>>>>>> > Department of Aerospace and Mechanical Engineering
>>>>>> > University of Notre Dame
>>>>>> >
>>>>>> > 117 Hessert Center
>>>>>> > Notre Dame, IN 46556-5684
>>>>>> > Phone: (574) 631-4335
>>>>>> > Fax: (574) 631-8355
>>>>>> > Email: akozlov at nd.edu
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Alexey V. Kozlov
>>>>>
>>>>> Research Scientist
>>>>> Department of Aerospace and Mechanical Engineering
>>>>> University of Notre Dame
>>>>>
>>>>> 117 Hessert Center
>>>>> Notre Dame, IN 46556-5684
>>>>> Phone: (574) 631-4335
>>>>> Fax: (574) 631-8355
>>>>> Email: akozlov at nd.edu
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Alexey V. Kozlov
>>>>
>>>> Research Scientist
>>>> Department of Aerospace and Mechanical Engineering
>>>> University of Notre Dame
>>>>
>>>> 117 Hessert Center
>>>> Notre Dame, IN 46556-5684
>>>> Phone: (574) 631-4335
>>>> Fax: (574) 631-8355
>>>> Email: akozlov at nd.edu
>>>>
>>>
>>
>> --
>> Alexey V. Kozlov
>>
>> Research Scientist
>> Department of Aerospace and Mechanical Engineering
>> University of Notre Dame
>>
>> 117 Hessert Center
>> Notre Dame, IN 46556-5684
>> Phone: (574) 631-4335
>> Fax: (574) 631-8355
>> Email: akozlov at nd.edu
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
Alexey V. Kozlov

Research Scientist
Department of Aerospace and Mechanical Engineering
University of Notre Dame

117 Hessert Center
Notre Dame, IN 46556-5684
Phone: (574) 631-4335
Fax: (574) 631-8355
Email: akozlov at nd.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201017/1fc90cd2/attachment-0001.html>

From knepley at gmail.com  Sat Oct 17 07:41:51 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 17 Oct 2020 08:41:51 -0400
Subject: [petsc-users] Preconditioner for Helmholtz-like problem
In-Reply-To: <CAMSzpnL1woS6-zW3Ki8W_S0+JHrfbZXGKQcchLXQUpjE5SBmMg@mail.gmail.com>
References: <CAMSzpn+cvji50kQkmn+9ng-d8HDRcyhEUiO5DafwYEy34CrdwQ@mail.gmail.com>
	<87o8m2tod8.fsf@jedbrown.org>
	<CAMSzpnJhFK8mUBKeykaG4sxinvHVJxWCwpZkMguNeTs_p6VzQw@mail.gmail.com>
	<A8918A9D-F0DF-4843-A896-FFF8FC87BA6E@petsc.dev>
	<CAMSzpn+BnEpezLM-hNRBbE3UdEu1mXHVHBnbJ0wVi_9g3M_0jw@mail.gmail.com>
	<CADOhEh5KRqwGMut7Avm34j41noWBjxAPeMNdV_u7wK9HN6AEhA@mail.gmail.com>
	<CAMSzpnLK0o45GbzvU=QAHf-iWXLCbndJBn4xDKexX4TXrn8xRQ@mail.gmail.com>
	<CAMYG4GksMg4nj_Yd7kF==0YV2ramV-tRdDxk-LkmxoietX6n8w@mail.gmail.com>
	<CAMSzpnL1woS6-zW3Ki8W_S0+JHrfbZXGKQcchLXQUpjE5SBmMg@mail.gmail.com>
Message-ID: <CAMYG4GmRA2osM4SMtwXrgNCKN_o4zxMORzBto3esd8xSmL3_Ww@mail.gmail.com>

On Sat, Oct 17, 2020 at 5:21 AM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
wrote:

> Matt,
>
> Thank you for your reply!
> My system has 8 NUMA nodes, so the memory bandwidth can increase up to 8
> times when doing parallel computations. In other words, each node of the
> big computer cluster works as a small cluster consisting of 8 nodes. Of
> course, this works only if the contribution of communications between the
> NUMA nodes is small. The total amount of memory on a single cluster node is
> 128GB, so it is enough to fit my application.
>

Barry is right, of course. We can see that the PETSc LU, using the natural
ordering, is doing 10,000x flops compared to MUMPS. Using the
same ordering, MUMPS might
still benefit from blocking, but the gap would be much much smaller.

I misunderstood your description of the parallelism. Yes, using 8 nodes you
could see 8x from one node. I think Pierre is correct that something
related to the size is
happening since the numeric factorization in the parallel case for MUMPS is
running at 30x the flop rate of the serial case. Its possible that they are
using a different
ordering in parallel that does more flope, but is more amenable to
vectorization. It is hard to know without reporting all the MUMPS options.

  Thanks,

    Matt


> Below is the output of -log_view for three cases:
> (1) BUILT-IN PETSC LU SOLVER
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./caat on a arch-linux-c-opt named d24cepyc110.crc.nd.edu with 1
> processor, by akozlov Sat Oct 17 03:58:23 2020
> Using 0 OpenMP threads
> Using Petsc Release Version 3.13.6, unknown
>
>                          Max       Max/Min     Avg       Total
> Time (sec):           5.551e+03     1.000   5.551e+03
> Objects:              1.000e+01     1.000   1.000e+01
> Flop:                 1.255e+13     1.000   1.255e+13  1.255e+13
> Flop/sec:             2.261e+09     1.000   2.261e+09  2.261e+09
> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00     0.000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flop
>                             and VecAXPY() for complex vectors of length N
> --> 8N flop
>
> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total    Count
> %Total     Avg         %Total    Count   %Total
>  0:      Main Stage: 5.5509e+03 100.0%  1.2551e+13 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flop: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    AvgLen: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flop in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop
>        --- Global ---  --- Stage ----  Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
>  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatSolve               1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  6246
> MatLUFactorSym         1 1.0 1.0673e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatLUFactorNum         1 1.0 5.5350e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00
> 0.0e+00100100  0  0  0 100100  0  0  0  2267
> MatAssemblyBegin       1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 1.0247e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 1.4306e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.2596e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                 4 1.0 9.3985e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyBegin       2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 7.3284e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  6245
> PCSetUp                1 1.0 5.5458e+03 1.0 1.25e+13 1.0 0.0e+00 0.0e+00
> 0.0e+00100100  0  0  0 100100  0  0  0  2262
> PCApply                1 1.0 7.3267e-01 1.0 4.58e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  6246
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     2              2  11501999992     0.
>               Vector     2              2      3761520     0.
>        Krylov Solver     1              1         1408     0.
>       Preconditioner     1              1         1184     0.
>            Index Set     3              3      1412088     0.
>               Viewer     1              0            0     0.
>
> ========================================================================================================================
> Average time to get PetscTime(): 7.15256e-08
> #PETSc Option Table entries:
> -ksp_type preonly
> -log_view
> -pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 16 sizeof(PetscInt) 4
> Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl
> --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
> --with-scalar-type=complex --with-clanguage=c --with-openmp
> --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx
> -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2
> -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist
> --download-mumps --download-scalapack --download-metis --download-cmake
> --download-parmetis --download-ptscotch
> -----------------------------------------
> Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu
> Machine characteristics:
> Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo
> Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc
> Using PETSc arch: arch-linux-c-opt
> -----------------------------------------
>
> Using C compiler: mpicc  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2 -fopenmp
> Using Fortran compiler: mpif90  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2  -fopenmp
> -----------------------------------------
>
> Using include paths: -I/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include
> -I/opt/crc/v/valgrind/3.14/ompi/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc
> -Wl,-rpath,/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl
> -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
> -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
> -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
> -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64
> -L/opt/crc/i/intel/19.0/mkl/lib/intel64
> -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64
> -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64
> -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/
> crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
> -L/afs/
> crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
> -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps
> -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64
> -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis
> -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis
> -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread
> -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc
> -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl
> -----------------------------------------
>
>
> (2) EXTERNAL PACKAGE MUMPS, 1 MPI PROCESS
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./caat on a arch-linux-c-opt named d24cepyc068.crc.nd.edu with 1
> processor, by akozlov Sat Oct 17 01:55:20 2020
> Using 0 OpenMP threads
> Using Petsc Release Version 3.13.6, unknown
>
>                          Max       Max/Min     Avg       Total
> Time (sec):           1.075e+02     1.000   1.075e+02
> Objects:              9.000e+00     1.000   9.000e+00
> Flop:                 1.959e+12     1.000   1.959e+12  1.959e+12
> Flop/sec:             1.823e+10     1.000   1.823e+10  1.823e+10
> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00     0.000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flop
>                             and VecAXPY() for complex vectors of length N
> --> 8N flop
>
> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total    Count
> %Total     Avg         %Total    Count   %Total
>  0:      Main Stage: 1.0747e+02 100.0%  1.9594e+12 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flop: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    AvgLen: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flop in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop
>        --- Global ---  --- Stage ----  Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
>  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatSolve               1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00
> 0.0e+00  0100  0  0  0   0100  0  0  0 6126201
> MatLUFactorSym         1 1.0 2.3141e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> MatLUFactorNum         1 1.0 1.0001e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 93  0  0  0  0  93  0  0  0  0    12
> MatAssemblyBegin       1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 1.0067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 1.8650e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.3029e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecCopy                1 1.0 1.0943e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                 4 1.0 9.2626e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyBegin       2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         2 1.0 4.7684e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 3.1981e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00
> 0.0e+00  0100  0  0  0   0100  0  0  0 6123146
> PCSetUp                1 1.0 1.0251e+02 1.0 1.16e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 95  0  0  0  0  95  0  0  0  0    11
> PCApply                1 1.0 3.1965e-01 1.0 1.96e+12 1.0 0.0e+00 0.0e+00
> 0.0e+00  0100  0  0  0   0100  0  0  0 6126096
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     2              2     59441612     0.
>               Vector     2              2      3761520     0.
>        Krylov Solver     1              1         1408     0.
>       Preconditioner     1              1         1184     0.
>            Index Set     2              2       941392     0.
>               Viewer     1              0            0     0.
>
> ========================================================================================================================
> Average time to get PetscTime(): 4.76837e-08
> #PETSc Option Table entries:
> -ksp_type preonly
> -log_view
> -pc_factor_mat_solver_type mumps
> -pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 16 sizeof(PetscInt) 4
> Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl
> --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
> --with-scalar-type=complex --with-clanguage=c --with-openmp
> --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx
> -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2
> -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist
> --download-mumps --download-scalapack --download-metis --download-cmake
> --download-parmetis --download-ptscotch
> -----------------------------------------
> Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu
> Machine characteristics:
> Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo
> Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc
> Using PETSc arch: arch-linux-c-opt
> -----------------------------------------
>
> Using C compiler: mpicc  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2 -fopenmp
> Using Fortran compiler: mpif90  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2  -fopenmp
> -----------------------------------------
>
> Using include paths: -I/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include
> -I/opt/crc/v/valgrind/3.14/ompi/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc
> -Wl,-rpath,/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl
> -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
> -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
> -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
> -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64
> -L/opt/crc/i/intel/19.0/mkl/lib/intel64
> -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64
> -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64
> -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/
> crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
> -L/afs/
> crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
> -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps
> -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64
> -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis
> -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis
> -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread
> -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc
> -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl
> -----------------------------------------
>
>
> (3) EXTERNAL PACKAGE MUMPS , 48 MPI PROCESSES ON A SINGLE CLUSTER NODE
> WITH 8 NUMA NODES
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./caat on a arch-linux-c-opt named d24cepyc069.crc.nd.edu with 48
> processors, by akozlov Sat Oct 17 04:40:25 2020
> Using 0 OpenMP threads
> Using Petsc Release Version 3.13.6, unknown
>
>                          Max       Max/Min     Avg       Total
> Time (sec):           1.415e+01     1.000   1.415e+01
> Objects:              3.000e+01     1.000   3.000e+01
> Flop:                 4.855e+10     1.637   4.084e+10  1.960e+12
> Flop/sec:             3.431e+09     1.637   2.886e+09  1.385e+11
> MPI Messages:         1.180e+02     2.682   8.169e+01  3.921e+03
> MPI Message Lengths:  1.559e+05     5.589   1.238e+03  4.855e+06
> MPI Reductions:       4.000e+01     1.000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flop
>                             and VecAXPY() for complex vectors of length N
> --> 8N flop
>
> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total    Count
> %Total     Avg         %Total    Count   %Total
>  0:      Main Stage: 1.4150e+01 100.0%  1.9602e+12 100.0%  3.921e+03
> 100.0%  1.238e+03      100.0%  3.100e+01  77.5%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flop: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    AvgLen: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flop in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop
>        --- Global ---  --- Stage ----  Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
>  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> BuildTwoSided          5 1.0 1.0707e-02 3.3 0.00e+00 0.0 7.8e+02 4.0e+00
> 5.0e+00  0  0 20  0 12   0  0 20  0 16     0
> BuildTwoSidedF         3 1.0 8.6837e-03 7.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+00  0  0  0  0  8   0  0  0  0 10     0
> MatSolve               1 1.0 6.6314e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03
> 6.0e+00  0100 90 87 15   0100 90 87 19 29529617
> MatLUFactorSym         1 1.0 2.4322e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+00 17  0  0  0 10  17  0  0  0 13     0
> MatLUFactorNum         1 1.0 5.8816e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00
> 0.0e+00 42  0  0  0  0  42  0  0  0  0   332
> MatAssemblyBegin       1 1.0 7.3917e-0357.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00  0  0  0  0  2   0  0  0  0  3     0
> MatAssemblyEnd         1 1.0 2.5823e-02 1.0 0.00e+00 0.0 3.8e+02 1.6e+03
> 5.0e+00  0  0 10 13 12   0  0 10 13 16     0
> MatGetRowIJ            1 1.0 3.5763e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 9.2506e-05 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                 4 1.0 5.3000e-0460.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyBegin       2 1.0 2.2390e-0319.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  5   0  0  0  0  6     0
> VecAssemblyEnd         2 1.0 9.7752e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin        2 1.0 1.6036e-0312.8 0.00e+00 0.0 5.9e+02 4.8e+03
> 1.0e+00  0  0 15 58  2   0  0 15 58  3     0
> VecScatterEnd          2 1.0 2.0087e-0338.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFSetGraph             2 1.0 1.5259e-05 5.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFSetUp                3 1.0 3.3023e-03 2.9 0.00e+00 0.0 1.6e+03 7.0e+02
> 2.0e+00  0  0 40 23  5   0  0 40 23  6     0
> SFBcastOpBegin         2 1.0 1.5953e-0313.7 0.00e+00 0.0 5.9e+02 4.8e+03
> 1.0e+00  0  0 15 58  2   0  0 15 58  3     0
> SFBcastOpEnd           2 1.0 2.0008e-0345.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFPack                 2 1.0 1.4646e-03361.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SFUnpack               2 1.0 4.1723e-0529.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 6.6350e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03
> 6.0e+00  0100 90 87 15   0100 90 87 19 29513594
> PCSetUp                1 1.0 8.4679e+00 1.0 5.08e+07 1.8 0.0e+00 0.0e+00
> 1.0e+01 60  0  0  0 25  60  0  0  0 32   230
> PCApply                1 1.0 6.6319e-02 1.0 4.85e+10 1.6 3.5e+03 1.2e+03
> 6.0e+00  0100 90 87 15   0100 90 87 19 29527282
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     4              4      1224428     0.
>          Vec Scatter     3              3         2400     0.
>               Vector     8              8      1923424     0.
>            Index Set     9              9        32392     0.
>    Star Forest Graph     3              3         3376     0.
>        Krylov Solver     1              1         1408     0.
>       Preconditioner     1              1         1160     0.
>               Viewer     1              0            0     0.
>
> ========================================================================================================================
> Average time to get PetscTime(): 7.15256e-08
> Average time for MPI_Barrier(): 3.48091e-06
> Average time for zero size MPI_Send(): 2.49843e-06
> #PETSc Option Table entries:
> -ksp_type preonly
> -log_view
> -pc_factor_mat_solver_type mumps
> -pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 16 sizeof(PetscInt) 4
> Configure options: --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl
> --with-g=1 --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
> --with-scalar-type=complex --with-clanguage=c --with-openmp
> --with-debugging=0 COPTFLAGS="-mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2" FOPTFLAGS="-mkl=parallel -O2 -mavx
> -axCORE-AVX2 -no-prec-div -fp-model fast=2" CXXOPTFLAGS="-mkl=parallel -O2
> -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2" --download-superlu_dist
> --download-mumps --download-scalapack --download-metis --download-cmake
> --download-parmetis --download-ptscotch
> -----------------------------------------
> Libraries compiled on 2020-10-14 10:52:17 on epycfe.crc.nd.edu
> Machine characteristics:
> Linux-3.10.0-1160.2.1.el7.x86_64-x86_64-with-redhat-7.9-Maipo
> Using PETSc directory: /afs/crc.nd.edu/user/a/akozlov/Private/petsc
> Using PETSc arch: arch-linux-c-opt
> -----------------------------------------
>
> Using C compiler: mpicc  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2 -fopenmp
> Using Fortran compiler: mpif90  -fPIC -mkl=parallel -O2 -mavx -axCORE-AVX2
> -no-prec-div -fp-model fast=2  -fopenmp
> -----------------------------------------
>
> Using include paths: -I/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/include -I/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/include
> -I/opt/crc/v/valgrind/3.14/ompi/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -lpetsc
> -Wl,-rpath,/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib -L/afs/
> crc.nd.edu/user/a/akozlov/Private/petsc/arch-linux-c-opt/lib
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl -L/opt/crc/i/intel/19.0/mkl
> -Wl,-rpath,/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
> -L/opt/crc/m/mvapich2/2.3.1/intel/19.0/lib
> -Wl,-rpath,/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
> -L/opt/crc/i/intel/19.0/tbb/lib/intel64_lin/gcc4.7
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64
> -L/opt/crc/i/intel/19.0/mkl/lib/intel64
> -Wl,-rpath,/opt/crc/i/intel/19.0/lib/intel64
> -L/opt/crc/i/intel/19.0/lib/intel64 -Wl,-rpath,/opt/crc/i/intel/19.0/lib64
> -L/opt/crc/i/intel/19.0/lib64 -Wl,-rpath,/afs/
> crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
> -L/afs/
> crc.nd.edu/x86_64_linux/i/intel/19.0/compilers_and_libraries_2019.2.187/linux/compiler/lib/intel64_lin
> -Wl,-rpath,/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
> -L/opt/crc/i/intel/19.0/mkl/lib/intel64_lin
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5
> -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lcmumps -ldmumps -lsmumps
> -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lmkl_intel_lp64
> -lmkl_core -lmkl_intel_thread -lpthread -lptesmumps -lptscotchparmetis
> -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lX11 -lparmetis
> -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmkl_intel_lp64 -lmkl_intel_thread
> -lmkl_core -liomp5 -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc
> -lpthread -lgcc_s -lirc_s -lrt -lquadmath -lstdc++ -ldl
> -----------------------------------------
>
>
>
> On Sat, Oct 17, 2020 at 12:33 AM Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Fri, Oct 16, 2020 at 11:48 PM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
>> wrote:
>>
>>> Thank you for your advice! My sparse matrix seems to be very stiff so I
>>> have decided to concentrate on the direct solvers. I have very good results
>>> with MUMPS. Due to a lack of time I haven?t got a good result with
>>> SuperLU_DIST and haven?t compiled PETSc with Pastix yet but I have a
>>> feeling that MUMPS is the best. I have run a sequential test case with
>>> built-in PETSc LU (-pc_type lu -ksp_type preonly) and MUMPs (-pc_type lu
>>> -ksp_type preonly -pc_factor_mat_solver_type mumps) with default settings
>>> and found that MUMPs was about 50 times faster than the built-in LU and
>>> used about 3 times less RAM. Do you have any idea why it could be?
>>>
>> The numbers do not sound realistic, but of course we do not have your
>> particular problem. In particular, the memory figure seems impossible.
>>
>>> My test case has about 100,000 complex equations with about 3,000,000
>>> non-zeros. PETSc was compiled with the following options: ./configure
>>> --with-blaslapack-dir=/opt/crc/i/intel/19.0/mkl --enable-g
>>> --with-valgrind-dir=/opt/crc/v/valgrind/3.14/ompi
>>> --with-scalar-type=complex --with-clanguage=c --with-openmp
>>> --with-debugging=0 COPTFLAGS='-mkl=parallel -O2 -mavx -axCORE-AVX2
>>> -no-prec-div -fp-model fast=2' FOPTFLAGS='-mkl=parallel -O2 -mavx
>>> -axCORE-AVX2 -no-prec-div -fp-model fast=2' CXXOPTFLAGS='-mkl=parallel -O2
>>> -mavx -axCORE-AVX2 -no-prec-div -fp-model fast=2' --download-superlu_dist
>>> --download-mumps --download-scalapack --download-metis --download-cmake
>>> --download-parmetis --download-ptscotch.
>>>
>>> Running MUPMS in parallel using MPI also gave me a significant gain in
>>> performance (about 10 times on a single cluster node).
>>>
>> Again, this does not appear to make sense. The performance should be
>> limited by memory bandwidth, and a single cluster node will not usually have
>> 10x the bandwidth of a CPU, although it might be possible with a very old
>> CPU.
>>
>> It would help to understand the performance if you would send the output
>> of -log_view.
>>
>>   Thanks,
>>
>>     Matt
>>
>>> Could you, please, advise me whether I can adjust some options for the
>>> direct solvers to improve performance? Should I try MUMPS in OpenMP mode?
>>>
>>> On Sat, Sep 19, 2020 at 7:40 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> As Jed said high frequency is hard. AMG, as-is,  can be adapted (
>>>> https://link.springer.com/article/10.1007/s00466-006-0047-8) with
>>>> parameters.
>>>> AMG for convection: use richardson/sor and not chebyshev smoothers and
>>>> in smoothed aggregation (gamg) don't smooth (-pc_gamg_agg_nsmooths 0).
>>>> Mark
>>>>
>>>> On Sat, Sep 19, 2020 at 2:11 AM Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
>>>> wrote:
>>>>
>>>>> Thanks a lot! I'll check them out.
>>>>>
>>>>> On Sat, Sep 19, 2020 at 1:41 AM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>
>>>>>>
>>>>>>   These are small enough that likely sparse direct solvers are the
>>>>>> best use of your time and for general efficiency.
>>>>>>
>>>>>>   PETSc supports 3 parallel direct solvers, SuperLU_DIST, MUMPs and
>>>>>> Pastix. I recommend configuring PETSc for all three of them and then
>>>>>> comparing them for problems of interest to you.
>>>>>>
>>>>>>    --download-superlu_dist --download-mumps --download-pastix
>>>>>> --download-scalapack (used by MUMPS) --download-metis --download-parmetis
>>>>>> --download-ptscotch
>>>>>>
>>>>>>   Barry
>>>>>>
>>>>>>
>>>>>> On Sep 18, 2020, at 11:28 PM, Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu>
>>>>>> wrote:
>>>>>>
>>>>>> Thanks for the tips! My matrix is complex and unsymmetric. My typical
>>>>>> test case has of the order of one million equations. I use a 2nd-order
>>>>>> finite-difference scheme with 19-point stencil, so my typical test case
>>>>>> uses several GB of RAM.
>>>>>>
>>>>>> On Fri, Sep 18, 2020 at 11:52 PM Jed Brown <jed at jedbrown.org> wrote:
>>>>>>
>>>>>>> Unfortunately, those are hard problems in which the "good" methods
>>>>>>> are technical and hard to make black-box.  There are "sweeping" methods
>>>>>>> that solve on 2D "slabs" with PML boundary conditions, H-matrix based
>>>>>>> methods, and fancy multigrid methods.  Attempting to solve with STRUMPACK
>>>>>>> is probably the easiest thing to try (--download-strumpack).
>>>>>>>
>>>>>>>
>>>>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATSOLVERSSTRUMPACK.html
>>>>>>>
>>>>>>> Is the matrix complex symmetric?
>>>>>>>
>>>>>>> Note that you can use a direct solver (MUMPS, STRUMPACK, etc.) for a
>>>>>>> 3D problem like this if you have enough memory.  I'm assuming the memory or
>>>>>>> time is unacceptable and you want an iterative method with much lower setup
>>>>>>> costs.
>>>>>>>
>>>>>>> Alexey Kozlov <Alexey.V.Kozlov.2 at nd.edu> writes:
>>>>>>>
>>>>>>> > Dear all,
>>>>>>> >
>>>>>>> > I am solving a convected wave equation in a frequency domain. This
>>>>>>> equation
>>>>>>> > is a 3D Helmholtz equation with added first-order derivatives and
>>>>>>> mixed
>>>>>>> > derivatives, and with complex coefficients. The discretized PDE
>>>>>>> results in
>>>>>>> > a sparse linear system (about 10^6 equations) which is solved in
>>>>>>> PETSc. I
>>>>>>> > am having difficulty with the code convergence at high frequency,
>>>>>>> skewed
>>>>>>> > grid, and high Mach number. I suspect it may be due to the
>>>>>>> preconditioner I
>>>>>>> > use. I am currently using the ILU preconditioner with the number
>>>>>>> of fill
>>>>>>> > levels 2 or 3, and BCGS or GMRES solvers. I suspect the state of
>>>>>>> the art
>>>>>>> > has evolved and there are better preconditioners for Helmholtz-like
>>>>>>> > problems. Could you, please, advise me on a better preconditioner?
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Alexey
>>>>>>> >
>>>>>>> > --
>>>>>>> > Alexey V. Kozlov
>>>>>>> >
>>>>>>> > Research Scientist
>>>>>>> > Department of Aerospace and Mechanical Engineering
>>>>>>> > University of Notre Dame
>>>>>>> >
>>>>>>> > 117 Hessert Center
>>>>>>> > Notre Dame, IN 46556-5684
>>>>>>> > Phone: (574) 631-4335
>>>>>>> > Fax: (574) 631-8355
>>>>>>> > Email: akozlov at nd.edu
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Alexey V. Kozlov
>>>>>>
>>>>>> Research Scientist
>>>>>> Department of Aerospace and Mechanical Engineering
>>>>>> University of Notre Dame
>>>>>>
>>>>>> 117 Hessert Center
>>>>>> Notre Dame, IN 46556-5684
>>>>>> Phone: (574) 631-4335
>>>>>> Fax: (574) 631-8355
>>>>>> Email: akozlov at nd.edu
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Alexey V. Kozlov
>>>>>
>>>>> Research Scientist
>>>>> Department of Aerospace and Mechanical Engineering
>>>>> University of Notre Dame
>>>>>
>>>>> 117 Hessert Center
>>>>> Notre Dame, IN 46556-5684
>>>>> Phone: (574) 631-4335
>>>>> Fax: (574) 631-8355
>>>>> Email: akozlov at nd.edu
>>>>>
>>>>
>>>
>>> --
>>> Alexey V. Kozlov
>>>
>>> Research Scientist
>>> Department of Aerospace and Mechanical Engineering
>>> University of Notre Dame
>>>
>>> 117 Hessert Center
>>> Notre Dame, IN 46556-5684
>>> Phone: (574) 631-4335
>>> Fax: (574) 631-8355
>>> Email: akozlov at nd.edu
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> Alexey V. Kozlov
>
> Research Scientist
> Department of Aerospace and Mechanical Engineering
> University of Notre Dame
>
> 117 Hessert Center
> Notre Dame, IN 46556-5684
> Phone: (574) 631-4335
> Fax: (574) 631-8355
> Email: akozlov at nd.edu
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201017/1329ec44/attachment-0001.html>

From namala2 at illinois.edu  Sun Oct 18 14:55:34 2020
From: namala2 at illinois.edu (Namala, Solomon)
Date: Sun, 18 Oct 2020 19:55:34 +0000
Subject: [petsc-users] Guidelines for Nested fieldsplit for domains with a
 hybrid mesh
Message-ID: <CH2PR11MB4440F1D16A57F6E52F861C0587010@CH2PR11MB4440.namprd11.prod.outlook.com>

Hello,

I am working to solve Stokes problem on a domain that is discretized using two different types of mesh. A part of the mesh uses fem formulation and the rest uses nodal integral method (NIM) formulation (the details of which I will skip). However, the key takeaway is that NIM formulation of stokes uses pressure Poisson formulation instead of the continuity equation while FEM formulation uses the continuity equation. They are coupled at the interface. Right now, I am building a single matrix for the entire domain and solving it using fieldsplit option in a nested fashion. The matrix structure and the unknown vector are shown below.

My questions are:


  *   Are there any basic guidelines to solve these kind of problems.
  *   As I have mentioned I am currently using nested fieldsplit. The first split is using indices and the other split is done using detect saddle point option. is there a way to avoid using that option and doing it by combining set of indices or fields.

The matrix structure is
[Au_fem   Bp_fem   0   0]
[Cu_fem.       0          0   0]
[0            0        Du_nim    Ep_nim]
[0            0           0         Fp_nim]

the unknown vector is given by

[ufem pfem unim pnim]

Let me know if any additional information is needed.

Thanks,
Solomon.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201018/84ed1149/attachment.html>

From bsmith at petsc.dev  Sun Oct 18 16:32:57 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 18 Oct 2020 16:32:57 -0500
Subject: [petsc-users] Guidelines for Nested fieldsplit for domains with
 a hybrid mesh
In-Reply-To: <CH2PR11MB4440F1D16A57F6E52F861C0587010@CH2PR11MB4440.namprd11.prod.outlook.com>
References: <CH2PR11MB4440F1D16A57F6E52F861C0587010@CH2PR11MB4440.namprd11.prod.outlook.com>
Message-ID: <8555AB7C-7589-4299-B476-418757D68A12@petsc.dev>


  From src/ksp/pc/impls/fieldsplit/fieldsplit.c this is how the saddle point is detected and set into the PC

     if (jac->detect) {
        IS       zerodiags,rest;
        PetscInt nmin,nmax;

        ierr = MatGetOwnershipRange(pc->mat,&nmin,&nmax);CHKERRQ(ierr);
        if (jac->diag_use_amat) {
          ierr = MatFindZeroDiagonals(pc->mat,&zerodiags);CHKERRQ(ierr);
        } else {
          ierr = MatFindZeroDiagonals(pc->pmat,&zerodiags);CHKERRQ(ierr);
        }
        ierr = ISComplement(zerodiags,nmin,nmax,&rest);CHKERRQ(ierr);
        ierr = PCFieldSplitSetIS(pc,"0",rest);CHKERRQ(ierr);
        ierr = PCFieldSplitSetIS(pc,"1",zerodiags);CHKERRQ(ierr);
        ierr = ISDestroy(&zerodiags);CHKERRQ(ierr);
        ierr = ISDestroy(&rest);CHKERRQ(ierr);

In addition these two options are set

PetscErrorCode PCFieldSplitSetDetectSaddlePoint(PC pc,PetscBool flg)
{
  PC_FieldSplit  *jac = (PC_FieldSplit*)pc->data;
  PetscErrorCode ierr;

  PetscFunctionBegin;
  jac->detect = flg;
  if (jac->detect) {
    ierr = PCFieldSplitSetType(pc,PC_COMPOSITE_SCHUR);CHKERRQ(ierr);
    ierr = PCFieldSplitSetSchurPre(pc,PC_FIELDSPLIT_SCHUR_PRE_SELF,NULL);CHKERRQ(ierr);
  }
  PetscFunctionReturn(0);
}

You can use these routines to directly manage the IS yourself in any manner you choice.

Good luck

  Barry

> On Oct 18, 2020, at 2:55 PM, Namala, Solomon <namala2 at illinois.edu> wrote:
> 
> Hello,
> 
> I am working to solve Stokes problem on a domain that is discretized using two different types of mesh. A part of the mesh uses fem formulation and the rest uses nodal integral method (NIM) formulation (the details of which I will skip). However, the key takeaway is that NIM formulation of stokes uses pressure Poisson formulation instead of the continuity equation while FEM formulation uses the continuity equation. They are coupled at the interface. Right now, I am building a single matrix for the entire domain and solving it using fieldsplit option in a nested fashion. The matrix structure and the unknown vector are shown below. 
> 
> My questions are:
> 
> Are there any basic guidelines to solve these kind of problems.
> As I have mentioned I am currently using nested fieldsplit. The first split is using indices and the other split is done using detect saddle point option. is there a way to avoid using that option and doing it by combining set of indices or fields. 
> The matrix structure is 
> [Au_fem   Bp_fem   0   0]
> [Cu_fem.       0          0   0]
> [0            0        Du_nim    Ep_nim]
> [0            0           0         Fp_nim]
> 
> the unknown vector is given by
> 
> [ufem pfem unim pnim]
> 
> Let me know if any additional information is needed.
> 
> Thanks,
> Solomon.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201018/a5408a51/attachment.html>

From hecbarcab at gmail.com  Tue Oct 20 03:40:09 2020
From: hecbarcab at gmail.com (=?UTF-8?Q?H=C3=A9ctor_Barreiro_Cabrera?=)
Date: Tue, 20 Oct 2020 10:40:09 +0200
Subject: [petsc-users] Eisenstat-Walker method with GPU assembled
 matrices
In-Reply-To: <D8FB2A4E-007D-4D79-B67C-D04608CE1407@petsc.dev>
References: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>
	<D8FB2A4E-007D-4D79-B67C-D04608CE1407@petsc.dev>
Message-ID: <CAKHA1A1S9=Q9dxAOY7p9DmNV=UU6gvpJkaUQ=_AAn3h8J1QNUg@mail.gmail.com>

El jue., 15 oct. 2020 a las 23:32, Barry Smith (<bsmith at petsc.dev>)
escribi?:

>
>   We still have the assumption the AIJ matrix always has a copy on the
> GPU.  How did you fill up the matrix on the GPU while not having its copy
> on the CPU?
>
> My strategy here was to initialize the structure on the CPU with dummy
values to have the corresponding device arrays allocated. Ideally I would
have initialized the structure on a kernel as well, since my intention is
to keep all data on the GPU (and not hit host memory other than for
debugging). But since the topology of my problem remains constant over
time, this approach proved to be sufficient. I did not find any problem
with my use case so far.

One thing I couldn't figure out, though, is how to force PETSc to transfer
the data back to host. MatView always displays the dummy values I used for
initialization. Is there a function to do this?

Thanks for the replies, by the way! I'm quite surprised how responsive the
PETSc community is! :)

Cheers,
H?ctor


>   Barry
>
>   When we remove this assumption we have to add a bunch more code for CPU
> only things to make sure they properly get the data from the GPU.
>
>
> On Oct 15, 2020, at 4:16 AM, H?ctor Barreiro Cabrera <hecbarcab at gmail.com>
> wrote:
>
> Hello fellow PETSc users,
>
> Following up my previous email
> <https://lists.mcs.anl.gov/pipermail/petsc-users/2020-September/042511.html>,
> I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA
> kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry
> Smith and Mark Adams!). However, I am now facing problems when trying to
> use this matrix within a SNES solver with the Eisenstat-Walker method
> enabled.
>
> According to PETSc's error log, the preconditioner is failing to invert
> the matrix diagonal. Specifically it says that:
> [0]PETSC ERROR: Arguments are incompatible
> [0]PETSC ERROR: Zero diagonal on row 0
> [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release
> --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++
> --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc"
> --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single
> --with-64-bit-indices=0 --with-single-library=1 --with-endian=little
> --with-debugging=0 --with-x=0 --with-windows-graphics=0
> --with-shared-libraries=1 --CUDAOPTFLAGS=-O2
>
> The stack trace leads to the diagonal inversion routine:
> [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
> [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
> [0]PETSC ERROR: #3 MatSOR() line 4005 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c
> [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c
> [0]PETSC ERROR: #5 PCPreSolve() line 1549 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c
> [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
> [0]PETSC ERROR: #7 KSPSolve() line 889 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c
> [0]PETSC ERROR: #9 SNESSolve() line 4567 in
> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c
>
> I am 100% positive that the diagonal does not contain a zero entry, so my
> suspicions are either that this operation is not supported on the GPU at
> all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I
> am missing some setting to make this work on the GPU. Is this correct?
>
> Thanks!
>
> Cheers,
> H?ctor
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201020/1ff9f828/attachment.html>

From knepley at gmail.com  Tue Oct 20 06:36:52 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 20 Oct 2020 07:36:52 -0400
Subject: [petsc-users] Eisenstat-Walker method with GPU assembled
 matrices
In-Reply-To: <CAKHA1A1S9=Q9dxAOY7p9DmNV=UU6gvpJkaUQ=_AAn3h8J1QNUg@mail.gmail.com>
References: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>
	<D8FB2A4E-007D-4D79-B67C-D04608CE1407@petsc.dev>
	<CAKHA1A1S9=Q9dxAOY7p9DmNV=UU6gvpJkaUQ=_AAn3h8J1QNUg@mail.gmail.com>
Message-ID: <CAMYG4GnW17Z+-3GiPu4h6ecH6kJwdqiGfypVX=tpS+Qkb==FSw@mail.gmail.com>

On Tue, Oct 20, 2020 at 4:40 AM H?ctor Barreiro Cabrera <hecbarcab at gmail.com>
wrote:

> El jue., 15 oct. 2020 a las 23:32, Barry Smith (<bsmith at petsc.dev>)
> escribi?:
>
>>
>>   We still have the assumption the AIJ matrix always has a copy on the
>> GPU.  How did you fill up the matrix on the GPU while not having its copy
>> on the CPU?
>>
>> My strategy here was to initialize the structure on the CPU with dummy
> values to have the corresponding device arrays allocated. Ideally I would
> have initialized the structure on a kernel as well, since my intention is
> to keep all data on the GPU (and not hit host memory other than for
> debugging). But since the topology of my problem remains constant over
> time, this approach proved to be sufficient. I did not find any problem
> with my use case so far.
>
> One thing I couldn't figure out, though, is how to force PETSc to transfer
> the data back to host. MatView always displays the dummy values I used for
> initialization. Is there a function to do this?
>

Hmm, this should happen automatically, so we have missed something. How do
you change the values on the device?

  Thanks,

    Matt


> Thanks for the replies, by the way! I'm quite surprised how responsive the
> PETSc community is! :)
>
> Cheers,
> H?ctor
>
>
>>   Barry
>>
>>   When we remove this assumption we have to add a bunch more code for CPU
>> only things to make sure they properly get the data from the GPU.
>>
>>
>> On Oct 15, 2020, at 4:16 AM, H?ctor Barreiro Cabrera <hecbarcab at gmail.com>
>> wrote:
>>
>> Hello fellow PETSc users,
>>
>> Following up my previous email
>> <https://lists.mcs.anl.gov/pipermail/petsc-users/2020-September/042511.html>,
>> I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA
>> kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry
>> Smith and Mark Adams!). However, I am now facing problems when trying to
>> use this matrix within a SNES solver with the Eisenstat-Walker method
>> enabled.
>>
>> According to PETSc's error log, the preconditioner is failing to invert
>> the matrix diagonal. Specifically it says that:
>> [0]PETSC ERROR: Arguments are incompatible
>> [0]PETSC ERROR: Zero diagonal on row 0
>> [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release
>> --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++
>> --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc"
>> --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single
>> --with-64-bit-indices=0 --with-single-library=1 --with-endian=little
>> --with-debugging=0 --with-x=0 --with-windows-graphics=0
>> --with-shared-libraries=1 --CUDAOPTFLAGS=-O2
>>
>> The stack trace leads to the diagonal inversion routine:
>> [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
>> [0]PETSC ERROR: #3 MatSOR() line 4005 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c
>> [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c
>> [0]PETSC ERROR: #5 PCPreSolve() line 1549 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c
>> [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
>> [0]PETSC ERROR: #7 KSPSolve() line 889 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c
>> [0]PETSC ERROR: #9 SNESSolve() line 4567 in
>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c
>>
>> I am 100% positive that the diagonal does not contain a zero entry, so my
>> suspicions are either that this operation is not supported on the GPU at
>> all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I
>> am missing some setting to make this work on the GPU. Is this correct?
>>
>> Thanks!
>>
>> Cheers,
>> H?ctor
>>
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201020/26c03436/attachment.html>

From stefano.zampini at gmail.com  Tue Oct 20 06:52:10 2020
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Tue, 20 Oct 2020 14:52:10 +0300
Subject: [petsc-users] Eisenstat-Walker method with GPU assembled
 matrices
In-Reply-To: <CAMYG4GnW17Z+-3GiPu4h6ecH6kJwdqiGfypVX=tpS+Qkb==FSw@mail.gmail.com>
References: <CAKHA1A19TqPMKzJZYzz2VVkO7RSnd6NrtQG_09etCBAdzadqTw@mail.gmail.com>
	<D8FB2A4E-007D-4D79-B67C-D04608CE1407@petsc.dev>
	<CAKHA1A1S9=Q9dxAOY7p9DmNV=UU6gvpJkaUQ=_AAn3h8J1QNUg@mail.gmail.com>
	<CAMYG4GnW17Z+-3GiPu4h6ecH6kJwdqiGfypVX=tpS+Qkb==FSw@mail.gmail.com>
Message-ID: <CAGPUisi-eR639Mq241mgq=Rw-VvB3cOL4Z6vai5xkfbJXUg+YQ@mail.gmail.com>

We currently do not have a transfer to host setup for cusparse. I have a
preliminary version here
https://gitlab.com/petsc/petsc/-/tree/stefanozampini/feature-mataij-create-fromcoo

Should be ready in a couple of days for review.

Il giorno mar 20 ott 2020 alle ore 14:37 Matthew Knepley <knepley at gmail.com>
ha scritto:

> On Tue, Oct 20, 2020 at 4:40 AM H?ctor Barreiro Cabrera <
> hecbarcab at gmail.com> wrote:
>
>> El jue., 15 oct. 2020 a las 23:32, Barry Smith (<bsmith at petsc.dev>)
>> escribi?:
>>
>>>
>>>   We still have the assumption the AIJ matrix always has a copy on the
>>> GPU.  How did you fill up the matrix on the GPU while not having its copy
>>> on the CPU?
>>>
>>> My strategy here was to initialize the structure on the CPU with dummy
>> values to have the corresponding device arrays allocated. Ideally I would
>> have initialized the structure on a kernel as well, since my intention is
>> to keep all data on the GPU (and not hit host memory other than for
>> debugging). But since the topology of my problem remains constant over
>> time, this approach proved to be sufficient. I did not find any problem
>> with my use case so far.
>>
>> One thing I couldn't figure out, though, is how to force PETSc to
>> transfer the data back to host. MatView always displays the dummy values I
>> used for initialization. Is there a function to do this?
>>
>
> Hmm, this should happen automatically, so we have missed something. How do
> you change the values on the device?
>
>   Thanks,
>
>     Matt
>
>
>> Thanks for the replies, by the way! I'm quite surprised how responsive
>> the PETSc community is! :)
>>
>> Cheers,
>> H?ctor
>>
>>
>>>   Barry
>>>
>>>   When we remove this assumption we have to add a bunch more code for
>>> CPU only things to make sure they properly get the data from the GPU.
>>>
>>>
>>> On Oct 15, 2020, at 4:16 AM, H?ctor Barreiro Cabrera <
>>> hecbarcab at gmail.com> wrote:
>>>
>>> Hello fellow PETSc users,
>>>
>>> Following up my previous email
>>> <https://lists.mcs.anl.gov/pipermail/petsc-users/2020-September/042511.html>,
>>> I managed to feed the entry data to a SeqAICUSPARSE matrix through a CUDA
>>> kernel using the new MatCUSPARSEGetDeviceMatWrite function (thanks Barry
>>> Smith and Mark Adams!). However, I am now facing problems when trying to
>>> use this matrix within a SNES solver with the Eisenstat-Walker method
>>> enabled.
>>>
>>> According to PETSc's error log, the preconditioner is failing to invert
>>> the matrix diagonal. Specifically it says that:
>>> [0]PETSC ERROR: Arguments are incompatible
>>> [0]PETSC ERROR: Zero diagonal on row 0
>>> [0]PETSC ERROR: Configure options PETSC_ARCH=win64_vs2019_release
>>> --with-cc="win32fe cl" --with-cxx="win32fe cl" --with-clanguage=C++
>>> --with-fc=0 --with-mpi=0 --with-cuda=1 --with-cudac="win32fe nvcc"
>>> --with-cuda-dir=~/cuda --download-f2cblaslapack=1 --with-precision=single
>>> --with-64-bit-indices=0 --with-single-library=1 --with-endian=little
>>> --with-debugging=0 --with-x=0 --with-windows-graphics=0
>>> --with-shared-libraries=1 --CUDAOPTFLAGS=-O2
>>>
>>> The stack trace leads to the diagonal inversion routine:
>>> [0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1913 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
>>> [0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1944 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\impls\aij\seq\aij.c
>>> [0]PETSC ERROR: #3 MatSOR() line 4005 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\mat\INTERF~1\matrix.c
>>> [0]PETSC ERROR: #4 PCPreSolve_Eisenstat() line 79 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\impls\eisens\eisen.c
>>> [0]PETSC ERROR: #5 PCPreSolve() line 1549 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\pc\INTERF~1\precon.c
>>> [0]PETSC ERROR: #6 KSPSolve_Private() line 686 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
>>> [0]PETSC ERROR: #7 KSPSolve() line 889 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\ksp\ksp\INTERF~1\itfunc.c
>>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\impls\ls\ls.c
>>> [0]PETSC ERROR: #9 SNESSolve() line 4567 in
>>> C:\cygwin64\home\HBARRE~1\PETSC-~1\src\snes\INTERF~1\snes.c
>>>
>>> I am 100% positive that the diagonal does not contain a zero entry, so
>>> my suspicions are either that this operation is not supported on the GPU at
>>> all (MatInvertDiagonal_SeqAIJ seems to access host-side memory) or that I
>>> am missing some setting to make this work on the GPU. Is this correct?
>>>
>>> Thanks!
>>>
>>> Cheers,
>>> H?ctor
>>>
>>>
>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201020/4baa84c1/attachment-0001.html>

From m.huysegoms at fz-juelich.de  Thu Oct 22 03:24:34 2020
From: m.huysegoms at fz-juelich.de (Marcel Huysegoms)
Date: Thu, 22 Oct 2020 10:24:34 +0200
Subject: [petsc-users] MatOrdering for rectangular matrix
Message-ID: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de>

Hi all,

I'm currently implementing a Gauss-Newton approach for minimizing a
non-linear cost function using PETSc4py.
The (rectangular) linear systems I am trying to solve have dimensions of
about (5N, N), where N is in the range of several hundred millions.

Due to its size and because it's an over-determined system, I use LSQR
in conjunction with a preconditioner (which operates on A^T x A, e.g.
BJacobi).
Depending on the ordering of the unknowns the algorithm only converges
for special cases. When I use a direct LR solver (as preconditioner) it
consistently converges, but consumes too much memory. I have read in the
manual that the LR solver internally also applies a matrix reordering
beforehand.

My question would be:
How can I improve the ordering of the unknowns for a rectangular matrix
(in order to converge also with iterative preconditioners)? If I use
MatGetOrdering(), it only works for square matrices. Is there a way to
achieve this from within PETSc4py?
ParMETIS seems to be a promising framework for that task. Is it possible
to apply its reordering algorithm to a rectangular PETSc-matrix?

I would be thankful for every bit of advice that might help.

Best regards,
Marcel


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------


From knepley at gmail.com  Thu Oct 22 04:55:15 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 22 Oct 2020 05:55:15 -0400
Subject: [petsc-users] MatOrdering for rectangular matrix
In-Reply-To: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de>
References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de>
Message-ID: <CAMYG4GmJT_uzAJXW10Z9BU3fGsGDrTZhRttKJa4Xts0Ow59dbQ@mail.gmail.com>

On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms <m.huysegoms at fz-juelich.de>
wrote:

> Hi all,
>
> I'm currently implementing a Gauss-Newton approach for minimizing a
> non-linear cost function using PETSc4py.
> The (rectangular) linear systems I am trying to solve have dimensions of
> about (5N, N), where N is in the range of several hundred millions.
>
> Due to its size and because it's an over-determined system, I use LSQR
> in conjunction with a preconditioner (which operates on A^T x A, e.g.
> BJacobi).
> Depending on the ordering of the unknowns the algorithm only converges
> for special cases. When I use a direct LR solver (as preconditioner) it
> consistently converges, but consumes too much memory. I have read in the
> manual that the LR solver internally also applies a matrix reordering
> beforehand.
>
> My question would be:
> How can I improve the ordering of the unknowns for a rectangular matrix
> (in order to converge also with iterative preconditioners)? If I use
> MatGetOrdering(), it only works for square matrices. Is there a way to
> achieve this from within PETSc4py?
> ParMETIS seems to be a promising framework for that task. Is it possible
> to apply its reordering algorithm to a rectangular PETSc-matrix?
>
> I would be thankful for every bit of advice that might help.
>

We do not have any rectangular reordering algorithms. I think your first
step is to
find something in the literature that you think will work.

  Thanks,

     Matt


> Best regards,
> Marcel
>
>
>
> ------------------------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
>
> ------------------------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------------------------
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/f39b2c65/attachment.html>

From bsmith at petsc.dev  Thu Oct 22 09:34:22 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Oct 2020 09:34:22 -0500
Subject: [petsc-users] MatOrdering for rectangular matrix
In-Reply-To: <CAMYG4GmJT_uzAJXW10Z9BU3fGsGDrTZhRttKJa4Xts0Ow59dbQ@mail.gmail.com>
References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de>
	<CAMYG4GmJT_uzAJXW10Z9BU3fGsGDrTZhRttKJa4Xts0Ow59dbQ@mail.gmail.com>
Message-ID: <9B822030-7E72-4C6A-9669-6AA82AFB0B95@petsc.dev>

  Marcel,

   Would you like to do the following? Compute   

    Q A  P where Q is a row permutation, P a column permutation and then apply LSQR on QAP? 
  

    From the manual page: 

In exact arithmetic the LSQR method (with no preconditioning) is identical to the KSPCG algorithm applied to the normal equations.

   [Q A  P]' [Q A  P] = P' A' A P = P'(A'A) P  the Q drops out because  permutation matrices' transposes are their inverse

 Note that P is a small square matrix. 

  So my conclusion is that any column permutation of A is also a symmetric permutation of A'A so you can just try using regular reorderings of A'A if 
you want to "concentrate" the "important" parts of A'A into your "block diagonal" preconditioner (and throw away the other parts)

  I don't know what it will do to the convergence. I've never had much luck generically trying to symmetrically reorder matrices to improve preconditioners but 
for certain situation maybe it might help. For example if the matrix is  [0 1; 1 0] and you permute it you get the [1 0; 0 1] which looks better.

  There is this https://epubs.siam.org/doi/10.1137/S1064827599361308 <https://epubs.siam.org/doi/10.1137/S1064827599361308>  but it is for non-symmetric permutations and in your case if you use a non symmetric permeation you can no longer use LSQR. 

  Barry


> On Oct 22, 2020, at 4:55 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms <m.huysegoms at fz-juelich.de <mailto:m.huysegoms at fz-juelich.de>> wrote:
> Hi all,
> 
> I'm currently implementing a Gauss-Newton approach for minimizing a
> non-linear cost function using PETSc4py.
> The (rectangular) linear systems I am trying to solve have dimensions of
> about (5N, N), where N is in the range of several hundred millions.
> 
> Due to its size and because it's an over-determined system, I use LSQR
> in conjunction with a preconditioner (which operates on A^T x A, e.g.
> BJacobi).
> Depending on the ordering of the unknowns the algorithm only converges
> for special cases. When I use a direct LR solver (as preconditioner) it
> consistently converges, but consumes too much memory. I have read in the
> manual that the LR solver internally also applies a matrix reordering
> beforehand.
> 
> My question would be:
> How can I improve the ordering of the unknowns for a rectangular matrix
> (in order to converge also with iterative preconditioners)? If I use
> MatGetOrdering(), it only works for square matrices. Is there a way to
> achieve this from within PETSc4py?
> ParMETIS seems to be a promising framework for that task. Is it possible
> to apply its reordering algorithm to a rectangular PETSc-matrix?
> 
> I would be thankful for every bit of advice that might help.
> 
> We do not have any rectangular reordering algorithms. I think your first step is to
> find something in the literature that you think will work.
> 
>   Thanks,
> 
>      Matt
>  
> Best regards,
> Marcel
> 
> 
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/a55df8d3/attachment.html>

From m.huysegoms at fz-juelich.de  Thu Oct 22 09:38:58 2020
From: m.huysegoms at fz-juelich.de (Marcel Huysegoms)
Date: Thu, 22 Oct 2020 16:38:58 +0200
Subject: [petsc-users] MatOrdering for rectangular matrix
In-Reply-To: <CAMYG4GmJT_uzAJXW10Z9BU3fGsGDrTZhRttKJa4Xts0Ow59dbQ@mail.gmail.com>
References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de>
	<CAMYG4GmJT_uzAJXW10Z9BU3fGsGDrTZhRttKJa4Xts0Ow59dbQ@mail.gmail.com>
Message-ID: <7eda6f8e-6dc7-5953-80bc-603c4c661169@fz-juelich.de>

Hi Matt,

thanks for your response!
I haven't studied the recent literature on reordering algorithms, but 
came across a talk by Tim Davis, the developer of SuiteSparse, from 2013:

https://www.youtube.com/watch?v=7ph4ZQ9oEIc&t=2109s

At minute 33:40 he shows the impact of different reordering libraries 
applied to a large least square system.
In doing so, he demonstrates how he achieves a significant speedup when 
using the matrix reordering algorithm of METIS/ParMETIS (which is a 
multilevel nested dissection). So it seems that METIS is able to compute 
an effective column reordering of rectangular matrices for fill-reducing 
factorizations. The respective slide of the talk is also available as a 
screenshot under:

https://www.mathworks.com/matlabcentral/answers/uploaded_files/173888/image.png

(extracted from a forum post on a similar topic: 
https://de.mathworks.com/matlabcentral/answers/275622-large-sparse-rectangular-over-determined-equation-system-to-reorder-or-to-not-reorder)

Considering that PETSc is offering a wrapper to the partitioning 
functionalities of ParMETIS, I am wondering, if it might be reasonable 
in the near future to also provide an option to use the reordering 
functionality of METIS (METIS_NodeND/ParMETIS_V3_NodeND) from within 
PETSc? That would be incredible and may be useful to many applications. 
I've just seen that MatGetOrdering() even provides an option for 
external libraries (MATORDERINGEXTERNAL). Is it maybe already possible 
to use the function in conjuction with ParMETIS?

Best regards,
Marcel


Am 22.10.20 um 11:55 schrieb Matthew Knepley:
> On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms 
> <m.huysegoms at fz-juelich.de <mailto:m.huysegoms at fz-juelich.de>> wrote:
>
>     Hi all,
>
>     I'm currently implementing a Gauss-Newton approach for minimizing a
>     non-linear cost function using PETSc4py.
>     The (rectangular) linear systems I am trying to solve have
>     dimensions of
>     about (5N, N), where N is in the range of several hundred millions.
>
>     Due to its size and because it's an over-determined system, I use LSQR
>     in conjunction with a preconditioner (which operates on A^T x A, e.g.
>     BJacobi).
>     Depending on the ordering of the unknowns the algorithm only converges
>     for special cases. When I use a direct LR solver (as
>     preconditioner) it
>     consistently converges, but consumes too much memory. I have read
>     in the
>     manual that the LR solver internally also applies a matrix reordering
>     beforehand.
>
>     My question would be:
>     How can I improve the ordering of the unknowns for a rectangular
>     matrix
>     (in order to converge also with iterative preconditioners)? If I use
>     MatGetOrdering(), it only works for square matrices. Is there a way to
>     achieve this from within PETSc4py?
>     ParMETIS seems to be a promising framework for that task. Is it
>     possible
>     to apply its reordering algorithm to a rectangular PETSc-matrix?
>
>     I would be thankful for every bit of advice that might help.
>
>
> We do not have any rectangular reordering algorithms. I think your 
> first step is to
> find something in the literature that you think will work.
>
> ? Thanks,
>
> ? ? ?Matt
>
>     Best regards,
>     Marcel
>
>
>     ------------------------------------------------------------------------------------------------
>     ------------------------------------------------------------------------------------------------
>     Forschungszentrum Juelich GmbH
>     52425 Juelich
>     Sitz der Gesellschaft: Juelich
>     Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>     Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
>     Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
>     Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
>     ------------------------------------------------------------------------------------------------
>     ------------------------------------------------------------------------------------------------
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/206c8d5e/attachment-0001.html>

From bsmith at petsc.dev  Thu Oct 22 11:45:35 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Oct 2020 11:45:35 -0500
Subject: [petsc-users] MatOrdering for rectangular matrix
In-Reply-To: <7eda6f8e-6dc7-5953-80bc-603c4c661169@fz-juelich.de>
References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de>
	<CAMYG4GmJT_uzAJXW10Z9BU3fGsGDrTZhRttKJa4Xts0Ow59dbQ@mail.gmail.com>
	<7eda6f8e-6dc7-5953-80bc-603c4c661169@fz-juelich.de>
Message-ID: <E9CE7D77-6265-4500-B6E6-B54A7CC51339@petsc.dev>


   Marcel,

  He also has SuiteSparseQR  AMD, so with your interpretation this means AMD can also reorder a rectangular matrix?

  I think you need to dig into SuiteSparseQR or his papers to find out what he is actually reordering; I suspect it is not actually the rectangular matrix.

   Barry


> On Oct 22, 2020, at 9:38 AM, Marcel Huysegoms <m.huysegoms at fz-juelich.de> wrote:
> 
> Hi Matt,
> 
> thanks for your response! 
> I haven't studied the recent literature on reordering algorithms, but came across a talk by Tim Davis, the developer of SuiteSparse, from 2013:
> 
> https://www.youtube.com/watch?v=7ph4ZQ9oEIc&t=2109s <https://www.youtube.com/watch?v=7ph4ZQ9oEIc&t=2109s>
> 
> At minute 33:40 he shows the impact of different reordering libraries applied to a large least square system.
> In doing so, he demonstrates how he achieves a significant speedup when using the matrix reordering algorithm of METIS/ParMETIS (which is a multilevel nested dissection). So it seems that METIS is able to compute an effective column reordering of rectangular matrices for fill-reducing factorizations. The respective slide of the talk     is also available as a screenshot under:
> 
> https://www.mathworks.com/matlabcentral/answers/uploaded_files/173888/image.png <https://www.mathworks.com/matlabcentral/answers/uploaded_files/173888/image.png>
> 
> (extracted from a forum post on a similar topic: https://de.mathworks.com/matlabcentral/answers/275622-large-sparse-rectangular-over-determined-equation-system-to-reorder-or-to-not-reorder <https://de.mathworks.com/matlabcentral/answers/275622-large-sparse-rectangular-over-determined-equation-system-to-reorder-or-to-not-reorder>)
> 
> Considering that PETSc is offering a wrapper to the partitioning functionalities of ParMETIS, I am wondering, if it might be reasonable in the near future to also provide an option to use the reordering functionality of METIS (METIS_NodeND/ParMETIS_V3_NodeND) from within PETSc? That would be incredible and may be useful to many applications. I've just seen that MatGetOrdering() even provides an option for external libraries (MATORDERINGEXTERNAL). Is it maybe already possible to use the function in conjuction with ParMETIS?
> 
> Best regards,
> Marcel
> 
> 
> Am 22.10.20 um 11:55 schrieb Matthew Knepley:
>> On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms <m.huysegoms at fz-juelich.de <mailto:m.huysegoms at fz-juelich.de>> wrote:
>> Hi all,
>> 
>> I'm currently implementing a Gauss-Newton approach for minimizing a
>> non-linear cost function using PETSc4py.
>> The (rectangular) linear systems I am trying to solve have dimensions of
>> about (5N, N), where N is in the range of several hundred millions.
>> 
>> Due to its size and because it's an over-determined system, I use LSQR
>> in conjunction with a preconditioner (which operates on A^T x A, e.g.
>> BJacobi).
>> Depending on the ordering of the unknowns the algorithm only converges
>> for special cases. When I use a direct LR solver (as preconditioner) it
>> consistently converges, but consumes too much memory. I have read in the
>> manual that the LR solver internally also applies a matrix reordering
>> beforehand.
>> 
>> My question would be:
>> How can I improve the ordering of the unknowns for a rectangular matrix
>> (in order to converge also with iterative preconditioners)? If I use
>> MatGetOrdering(), it only works for square matrices. Is there a way to
>> achieve this from within PETSc4py?
>> ParMETIS seems to be a promising framework for that task. Is it possible
>> to apply its reordering algorithm to a rectangular PETSc-matrix?
>> 
>> I would be thankful for every bit of advice that might help.
>> 
>> We do not have any rectangular reordering algorithms. I think your first step is to
>> find something in the literature that you think will work.
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>> Best regards,
>> Marcel
>> 
>> 
>> ------------------------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------------------------
>> Forschungszentrum Juelich GmbH
>> 52425 Juelich
>> Sitz der Gesellschaft: Juelich
>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
>> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
>> ------------------------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------------------------
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/62abefb0/attachment.html>

From pranayreddy865 at gmail.com  Thu Oct 22 14:12:51 2020
From: pranayreddy865 at gmail.com (baikadi pranay)
Date: Thu, 22 Oct 2020 12:12:51 -0700
Subject: [petsc-users] FPE when trying to find the condition number
Message-ID: <CA+zFCTmT_Q70kA1U0WQ5cq+LH8eazhr7vH2zkRxt+Wje29VXog@mail.gmail.com>

Hello,

I am trying to find the condition number of the A matrix for a linear
system I am solving. I have used the following commands.

*./a.out -ksp_monitor_singular_value -ksp_type gmres -ksp_gmres_restart
1000 -pc_type none*However, the execution comes to a halt after a few
iterations with the following error.
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
Exception,probably divide by zero
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: ---------------------  Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR:       INSTEAD the line number of the start of the function
[0]PETSC ERROR:       is given.
[0]PETSC ERROR: [0] LAPACKgesvd line 40
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c
[0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 22
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c
[0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 59
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: [0] KSPMonitorSingularValue line 130
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/iterativ.c
[0]PETSC ERROR: [0] KSPMonitor line 1765
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: [0] KSPGMRESCycle line 122
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c
[0]PETSC ERROR: [0] KSPSolve_GMRES line 225
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c
[0]PETSC ERROR: [0] KSPSolve line 678
/packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
[0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-9.agave.rc.asu.edu
by pbaikadi Thu Oct 22 12:07:11 2020
[0]PETSC ERROR: Configure options
[0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Is the error because the A matrix is singular (causing the max/min to be
undefined)? Please let me know.

Thank you,
Sincerely,
Pranay.
?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/8767c75f/attachment.html>

From Antoine.Cote3 at USherbrooke.ca  Thu Oct 22 14:23:12 2020
From: Antoine.Cote3 at USherbrooke.ca (=?iso-8859-1?Q?Antoine_C=F4t=E9?=)
Date: Thu, 22 Oct 2020 19:23:12 +0000
Subject: [petsc-users] Enhancing MatScale computing time
Message-ID: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>

Hi,

I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) :

MatZeroEntries(K)
In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
MatDiagonalScale(K, vec1, vec1)
MatDiagonalSet(K, vec2, ADD_VALUES)

Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks :

  *   DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4
  *   Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual
  *   Tried MatSetOption()
     *   MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency
     *   MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
     *   MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
     *   MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly"

According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue.

I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster?

Thank you very much!

Antoine C?t?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/3d26d15e/attachment.html>

From knepley at gmail.com  Thu Oct 22 14:23:45 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 22 Oct 2020 15:23:45 -0400
Subject: [petsc-users] FPE when trying to find the condition number
In-Reply-To: <CA+zFCTmT_Q70kA1U0WQ5cq+LH8eazhr7vH2zkRxt+Wje29VXog@mail.gmail.com>
References: <CA+zFCTmT_Q70kA1U0WQ5cq+LH8eazhr7vH2zkRxt+Wje29VXog@mail.gmail.com>
Message-ID: <CAMYG4GmDq4A0t975hcmnhsUTpO8uNDcd1tc9qLHfRoCsjj3V=Q@mail.gmail.com>

On Thu, Oct 22, 2020 at 3:13 PM baikadi pranay <pranayreddy865 at gmail.com>
wrote:

> Hello,
>
> I am trying to find the condition number of the A matrix for a linear
> system I am solving. I have used the following commands.
>
> *./a.out -ksp_monitor_singular_value -ksp_type gmres -ksp_gmres_restart
> 1000 -pc_type none*However, the execution comes to a halt after a few
> iterations with the following error.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
> Exception,probably divide by zero
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] LAPACKgesvd line 40
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c
> [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 22
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c
> [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 59
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: [0] KSPMonitorSingularValue line 130
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/iterativ.c
> [0]PETSC ERROR: [0] KSPMonitor line 1765
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: [0] KSPGMRESCycle line 122
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c
> [0]PETSC ERROR: [0] KSPSolve_GMRES line 225
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c
> [0]PETSC ERROR: [0] KSPSolve line 678
> /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
> [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named
> cg17-9.agave.rc.asu.edu by pbaikadi Thu Oct 22 12:07:11 2020
> [0]PETSC ERROR: Configure options
> [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 59.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> Is the error because the A matrix is singular (causing the max/min to be
> undefined)? Please let me know.
>

No. It is more likely that there is an invalid value, like an Inf or NaN.

  Thanks,

     Matt


> Thank you,
> Sincerely,
> Pranay.
> ?
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/d3254e23/attachment-0001.html>

From knepley at gmail.com  Thu Oct 22 14:35:59 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 22 Oct 2020 15:35:59 -0400
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>

On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca>
wrote:

> Hi,
>
> I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse
> matrix Mat K. The Mat is modified repeatedly by the program, using the
> commands (in that order) :
>
> MatZeroEntries(K)
> In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
> MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
> MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
> MatDiagonalScale(K, vec1, vec1)
> MatDiagonalSet(K, vec2, ADD_VALUES)
>
> Computing time seems high and I would like to improve it. Running tests
> with "-log_view" tells me that MatScale() is the bottle neck (50% of total
> computing time) . From manual pages, I've tried a few tweaks :
>
>    - DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees
>    of freedom per node, ... BAIJ can significantly enhance performance",
>    Chapter 14.2.4
>    - Used MatMissingDiagonal() to confirm there is no missing diagonal
>    entries : "If the matrix Y is missing some diagonal entries this routine
>    can be very slow", MatDiagonalSet() manual
>    - Tried MatSetOption()
>       - MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly
>       efficiency
>       - MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly
>       processes have one less global reduction"
>       - MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true,
>       assembly processes have one less global reduction"
>       - MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during
>       matrix assembly"
>
> According to "-log_view", assembly is fast (0% of total time), and the
> use of a DMDA makes me believe preallocation isn't the cause of performance
> issue.
>
> I would like to know how could I improve MatScale(). What are the best
> practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)?
> Instead of MatDiagonalScale(), should I use another command to obtain the
> same result faster?
>

Something is definitely strange. Can you please send the output of

  -log_view -info :mat

  Thanks,

     Matt


> Thank you very much!
>
> Antoine C?t?
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/9baf75a7/attachment.html>

From Antoine.Cote3 at USherbrooke.ca  Thu Oct 22 15:02:00 2020
From: Antoine.Cote3 at USherbrooke.ca (=?iso-8859-1?Q?Antoine_C=F4t=E9?=)
Date: Thu, 22 Oct 2020 20:02:00 +0000
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>,
	<CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>
Message-ID: <YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>

Hi,

See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof.

Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used.

Regards,

Antoine
________________________________
De : Matthew Knepley <knepley at gmail.com>
Envoy? : 22 octobre 2020 15:35
? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca>
Cc : petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Objet : Re: [petsc-users] Enhancing MatScale computing time

On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca<mailto:Antoine.Cote3 at usherbrooke.ca>> wrote:
Hi,

I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) :

MatZeroEntries(K)
In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
MatDiagonalScale(K, vec1, vec1)
MatDiagonalSet(K, vec2, ADD_VALUES)

Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks :

  *   DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4
  *   Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual
  *   Tried MatSetOption()
     *   MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency
     *   MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
     *   MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
     *   MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly"

According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue.

I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster?

Something is definitely strange. Can you please send the output of

  -log_view -info :mat

  Thanks,

     Matt

Thank you very much!

Antoine C?t?


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7C6b823852b3964170f52908d876c1bb0b%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389921724846720%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w7%2Fc%2BSzAfTa02gxTS8VbB%2FVwIPpaKw%2F%2BeiiX4K9gd1k%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/4836e24b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LogView.out
Type: application/octet-stream
Size: 14200 bytes
Desc: LogView.out
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/4836e24b/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mat.0
Type: application/octet-stream
Size: 234549 bytes
Desc: mat.0
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/4836e24b/attachment-0003.obj>

From bsmith at petsc.dev  Thu Oct 22 15:09:33 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Oct 2020 15:09:33 -0500
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>
	<YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
Message-ID: <8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev>


MatMult             9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 23 48  0  0  0  61 91  0  0  0  1079
MatScale               6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   467

Though the flop rate of MatScale is not so high (467) it is taking very little (0 percent of the run time while MatMult takes 23 percent of the time).

So the main cost related to the matrices is MatMult because it has a lot of operations 9553, you might think about your algorithms you are using and if there 
improvements. 

It looks like you are using some kind of multigrid and solve 6 problems with 1357 total iterations which is 200 iterations per solve. This is absolutely HUGE for multigrain, you need to tune the multigrid for you problem to bring that down to at most a couple dozen iterations per solve.

  Barry

> On Oct 22, 2020, at 3:02 PM, Antoine C?t? <Antoine.Cote3 at USherbrooke.ca> wrote:
> 
> Hi,
> 
> See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof.
> 
> Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used.
> 
> Regards,
> 
> Antoine
> De : Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> Envoy? : 22 octobre 2020 15:35
> ? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca <mailto:Antoine.Cote3 at USherbrooke.ca>>
> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Objet : Re: [petsc-users] Enhancing MatScale computing time
>  
> On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca <mailto:Antoine.Cote3 at usherbrooke.ca>> wrote:
> Hi,
> 
> I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) :
> 
> MatZeroEntries(K)
> In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
> MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
> MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
> MatDiagonalScale(K, vec1, vec1)
> MatDiagonalSet(K, vec2, ADD_VALUES)
> 
> Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks :
> DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4
> Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual
> Tried MatSetOption()
> MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency
> MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
> MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
> MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly"
> According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue.
> 
> I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster?
> 
> Something is definitely strange. Can you please send the output of
> 
>   -log_view -info :mat
> 
>   Thanks,
> 
>      Matt
>  
> Thank you very much!
> 
> Antoine C?t?
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7C6b823852b3964170f52908d876c1bb0b%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389921724846720%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w7%2Fc%2BSzAfTa02gxTS8VbB%2FVwIPpaKw%2F%2BeiiX4K9gd1k%3D&reserved=0>
> <LogView.out><mat.0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/c676179d/attachment.html>

From bsmith at petsc.dev  Thu Oct 22 15:13:54 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Oct 2020 15:13:54 -0500
Subject: [petsc-users] FPE when trying to find the condition number
In-Reply-To: <CA+zFCTmT_Q70kA1U0WQ5cq+LH8eazhr7vH2zkRxt+Wje29VXog@mail.gmail.com>
References: <CA+zFCTmT_Q70kA1U0WQ5cq+LH8eazhr7vH2zkRxt+Wje29VXog@mail.gmail.com>
Message-ID: <697184D9-7A67-4621-8120-CA78A0A5C18F@petsc.dev>


  The reference implementation of LAPACK tries a divide by zero in its setup to see if it can divide by zero and that is happening for you. 

  Hence the PETSc code has

  ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr);
#if !defined(PETSC_USE_COMPLEX)
  PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,&lierr));
#else
  PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,realpart+N,&lierr));
#endif
  if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD Lapack routine %d",(int)lierr);
  ierr = PetscFPTrapPop();CHKERRQ(ierr);

which is suppose to turn off the trapping. The code that turns off the trapping is OS dependent, perhaps it does not work for you.

There is a bit better code in the current release than 3.11 I recommend you first upgrade.

What system are you running on?

Barry


> On Oct 22, 2020, at 2:12 PM, baikadi pranay <pranayreddy865 at gmail.com> wrote:
> 
> Hello,
> 
> I am trying to find the condition number of the A matrix for a linear system I am solving. I have used the following commands.
> ./a.out -ksp_monitor_singular_value -ksp_type gmres -ksp_gmres_restart 1000 -pc_type none
> However, the execution comes to a halt after a few iterations with the following error.
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind <http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
> [0]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> on GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [0]PETSC ERROR:       INSTEAD the line number of the start of the function
> [0]PETSC ERROR:       is given.
> [0]PETSC ERROR: [0] LAPACKgesvd line 40 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c
> [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 22 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmreig.c
> [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 59 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: [0] KSPMonitorSingularValue line 130 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/iterativ.c
> [0]PETSC ERROR: [0] KSPMonitor line 1765 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: [0] KSPGMRESCycle line 122 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c
> [0]PETSC ERROR: [0] KSPSolve_GMRES line 225 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c
> [0]PETSC ERROR: [0] KSPSolve line 678 /packages/7x/petsc/3.11.1/petsc-3.11.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html <http://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 
> [0]PETSC ERROR: ./a.out on a linux-gnu-c-debug named cg17-9.agave.rc.asu.edu <http://cg17-9.agave.rc.asu.edu/> by pbaikadi Thu Oct 22 12:07:11 2020
> [0]PETSC ERROR: Configure options 
> [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 59.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> Is the error because the A matrix is singular (causing the max/min to be undefined)? Please let me know.
> 
> Thank you,
> Sincerely,
> Pranay.
> ?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/cc9a83b8/attachment-0001.html>

From Antoine.Cote3 at USherbrooke.ca  Thu Oct 22 15:17:18 2020
From: Antoine.Cote3 at USherbrooke.ca (=?iso-8859-1?Q?Antoine_C=F4t=E9?=)
Date: Thu, 22 Oct 2020 20:17:18 +0000
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>
	<YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>,
	<8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev>
Message-ID: <YQXPR0101MB160676B1A7375D756FC3D686C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>

Hi Sir,

MatScale in "Main Stage" is indeed called 6 times for 0% run time. In stage "Stiff_Adj" though, we get :

MatScale            8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 50 46  0  0  0  80 98  0  0  0   482

MatMult is indeed expensive (23% run time) and should be improved, but MatScale in "Stiff_Adj" is still taking 50% run time

Thanks,

Antoine
________________________________
De : Barry Smith <bsmith at petsc.dev>
Envoy? : 22 octobre 2020 16:09
? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca>
Cc : petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Objet : Re: [petsc-users] Enhancing MatScale computing time


MatMult             9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 23 48  0  0  0  61 91  0  0  0  1079
MatScale               6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   467

Though the flop rate of MatScale is not so high (467) it is taking very little (0 percent of the run time while MatMult takes 23 percent of the time).

So the main cost related to the matrices is MatMult because it has a lot of operations 9553, you might think about your algorithms you are using and if there
improvements.

It looks like you are using some kind of multigrid and solve 6 problems with 1357 total iterations which is 200 iterations per solve. This is absolutely HUGE for multigrain, you need to tune the multigrid for you problem to bring that down to at most a couple dozen iterations per solve.

  Barry

On Oct 22, 2020, at 3:02 PM, Antoine C?t? <Antoine.Cote3 at USherbrooke.ca<mailto:Antoine.Cote3 at USherbrooke.ca>> wrote:

Hi,

See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof.

Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used.

Regards,

Antoine
________________________________
De : Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Envoy? : 22 octobre 2020 15:35
? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca<mailto:Antoine.Cote3 at USherbrooke.ca>>
Cc : petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Objet : Re: [petsc-users] Enhancing MatScale computing time

On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca<mailto:Antoine.Cote3 at usherbrooke.ca>> wrote:
Hi,

I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) :

MatZeroEntries(K)
In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
MatDiagonalScale(K, vec1, vec1)
MatDiagonalSet(K, vec2, ADD_VALUES)

Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks :

  *   DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4
  *   Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual
  *   Tried MatSetOption()
     *   MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency
     *   MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
     *   MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
     *   MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly"

According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue.

I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster?

Something is definitely strange. Can you please send the output of

  -log_view -info :mat

  Thanks,

     Matt

Thank you very much!

Antoine C?t?


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7C2f4d6ff4e9aa48b4058a08d876c6665d%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389941843624498%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EgZu%2BdmuXzZwE8LSyMC4BhoC7Or%2BHvrwykv%2BcPZOCXg%3D&reserved=0>
<LogView.out><mat.0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/17bab7d2/attachment.html>

From knepley at gmail.com  Thu Oct 22 15:28:28 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 22 Oct 2020 16:28:28 -0400
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <YQXPR0101MB160676B1A7375D756FC3D686C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>
	<YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev>
	<YQXPR0101MB160676B1A7375D756FC3D686C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4Gmxx_jY9DEoMsWj_tX8F3Hs6Es_38VHtLVNv=sqhLuAgA@mail.gmail.com>

On Thu, Oct 22, 2020 at 4:17 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca>
wrote:

> Hi Sir,
>
> MatScale in "Main Stage" is indeed called 6 times for 0% run time. In
> stage "Stiff_Adj" though, we get :
>
> MatScale            8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 50 46  0  0  0  80 98  0  0  0   482
>
> MatMult is indeed expensive (23% run time) and should be improved, but
> MatScale in "Stiff_Adj" is still taking 50% run time
>

I was a little surprised that MatScale gets only 450 MFlops. However, it
looks like you are running the debugging version of PETSc. Could you
configure
a version without debugging:

  $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/reconfigure-$PETSC_ARCH.py
--with-debugging=0 --PETSC_ARCH=arch-master-opt

and rerun the timings?

  Thanks,

     Matt


> Thanks,
>
> Antoine
> ------------------------------
> *De :* Barry Smith <bsmith at petsc.dev>
> *Envoy? :* 22 octobre 2020 16:09
> *? :* Antoine C?t? <Antoine.Cote3 at USherbrooke.ca>
> *Cc :* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Objet :* Re: [petsc-users] Enhancing MatScale computing time
>
>
> MatMult             9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 23 48  0  0  0  61 91  0  0  0  1079
> MatScale               6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   467
>
> Though the flop rate of MatScale is not so high (467) it is taking very
> little (0 percent of the run time while MatMult takes 23 percent of the
> time).
>
> So the main cost related to the matrices is MatMult because it has a lot
> of operations 9553, you might think about your algorithms you are using and
> if there
> improvements.
>
> It looks like you are using some kind of multigrid and solve 6 problems
> with 1357 total iterations which is 200 iterations per solve. This is
> absolutely HUGE for multigrain, you need to tune the multigrid for you
> problem to bring that down to at most a couple dozen iterations per solve.
>
>   Barry
>
> On Oct 22, 2020, at 3:02 PM, Antoine C?t? <Antoine.Cote3 at USherbrooke.ca>
> wrote:
>
> Hi,
>
> See attached files for both outputs. Tell me if you need any
> clarification. It was run with a DMDA of 33x17x17 nodes (creating
> 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611
> dof.
>
> Note : Stage "Stiff_Adj" is the part of the code modifying Mat
> K. PetscLogStagePush/Pop was used.
>
> Regards,
>
> Antoine
> ------------------------------
> *De :* Matthew Knepley <knepley at gmail.com>
> *Envoy? :* 22 octobre 2020 15:35
> *? :* Antoine C?t? <Antoine.Cote3 at USherbrooke.ca>
> *Cc :* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Objet :* Re: [petsc-users] Enhancing MatScale computing time
>
> On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca>
> wrote:
>
> Hi,
>
> I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse
> matrix Mat K. The Mat is modified repeatedly by the program, using the
> commands (in that order) :
>
> MatZeroEntries(K)
> In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
> MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
> MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
> MatDiagonalScale(K, vec1, vec1)
> MatDiagonalSet(K, vec2, ADD_VALUES)
>
> Computing time seems high and I would like to improve it. Running tests
> with "-log_view" tells me that MatScale() is the bottle neck (50% of total
> computing time) . From manual pages, I've tried a few tweaks :
>
>    - DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees
>    of freedom per node, ... BAIJ can significantly enhance performance",
>    Chapter 14.2.4
>    - Used MatMissingDiagonal() to confirm there is no missing diagonal
>    entries : "If the matrix Y is missing some diagonal entries this routine
>    can be very slow", MatDiagonalSet() manual
>    - Tried MatSetOption()
>       - MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly
>       efficiency
>       - MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly
>       processes have one less global reduction"
>       - MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true,
>       assembly processes have one less global reduction"
>       - MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during
>       matrix assembly"
>
> According to "-log_view", assembly is fast (0% of total time), and the
> use of a DMDA makes me believe preallocation isn't the cause of performance
> issue.
>
> I would like to know how could I improve MatScale(). What are the best
> practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)?
> Instead of MatDiagonalScale(), should I use another command to obtain the
> same result faster?
>
>
> Something is definitely strange. Can you please send the output of
>
>   -log_view -info :mat
>
>   Thanks,
>
>      Matt
>
>
> Thank you very much!
>
> Antoine C?t?
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7C2f4d6ff4e9aa48b4058a08d876c6665d%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389941843624498%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EgZu%2BdmuXzZwE8LSyMC4BhoC7Or%2BHvrwykv%2BcPZOCXg%3D&reserved=0>
> <LogView.out><mat.0>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/7c8f65e3/attachment-0001.html>

From bsmith at petsc.dev  Thu Oct 22 16:03:11 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Oct 2020 16:03:11 -0500
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>
	<YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
Message-ID: <857A52B0-B6C5-4E4F-B85C-3F69C044E477@petsc.dev>

Yes, you are correct I missed that part of the run

As you can see below MatScale calls only BLAS dscal() there is really no way to make that go faster.

How big is the matrix.

What are you doing with the matrix after you scale it? The only way to improve the time is to find some way to scale it less often.

It is curious that VecScale has a much higher flop rate when it has the same code see below. Unless the matrices are tiny I would expect similar flop rates.

  Barry


PetscErrorCode MatScale_SeqAIJ(Mat inA,PetscScalar alpha)
{
  Mat_SeqAIJ     *a     = (Mat_SeqAIJ*)inA->data;
  PetscScalar    oalpha = alpha;
  PetscErrorCode ierr;
  PetscBLASInt   one = 1,bnz;

  PetscFunctionBegin;
  ierr = PetscBLASIntCast(a->nz,&bnz);CHKERRQ(ierr);
  PetscStackCallBLAS("BLASscal",BLASscal_(&bnz,&oalpha,a->a,&one));
  ierr = PetscLogFlops(a->nz);CHKERRQ(ierr);
  ierr = MatSeqAIJInvalidateDiagonal(inA);CHKERRQ(ierr);
#if defined(PETSC_HAVE_DEVICE)
  if (inA->offloadmask != PETSC_OFFLOAD_UNALLOCATED) inA->offloadmask = PETSC_OFFLOAD_CPU;
#endif
  PetscFunctionReturn(0);
}

PetscErrorCode VecScale_Seq(Vec xin, PetscScalar alpha)
{
  PetscErrorCode ierr;
  PetscBLASInt   one = 1,bn;

  PetscFunctionBegin;
  ierr = PetscBLASIntCast(xin->map->n,&bn);CHKERRQ(ierr);
  if (alpha == (PetscScalar)0.0) {
    ierr = VecSet_Seq(xin,alpha);CHKERRQ(ierr);
  } else if (alpha != (PetscScalar)1.0) {
    PetscScalar a = alpha,*xarray;
    ierr = VecGetArray(xin,&xarray);CHKERRQ(ierr);
    PetscStackCallBLAS("BLASscal",BLASscal_(&bn,&a,xarray,&one));
    ierr = VecRestoreArray(xin,&xarray);CHKERRQ(ierr);
  }
  ierr = PetscLogFlops(xin->map->n);CHKERRQ(ierr);
  PetscFunctionReturn(0);
}


> On Oct 22, 2020, at 3:02 PM, Antoine C?t? <Antoine.Cote3 at USherbrooke.ca> wrote:
> 
> Hi,
> 
> See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof.
> 
> Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used.
> 
> Regards,
> 
> Antoine
> De : Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> Envoy? : 22 octobre 2020 15:35
> ? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca <mailto:Antoine.Cote3 at USherbrooke.ca>>
> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Objet : Re: [petsc-users] Enhancing MatScale computing time
>  
> On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca <mailto:Antoine.Cote3 at usherbrooke.ca>> wrote:
> Hi,
> 
> I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) :
> 
> MatZeroEntries(K)
> In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
> MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
> MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
> MatDiagonalScale(K, vec1, vec1)
> MatDiagonalSet(K, vec2, ADD_VALUES)
> 
> Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks :
> DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4
> Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual
> Tried MatSetOption()
> MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency
> MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
> MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
> MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly"
> According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue.
> 
> I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster?
> 
> Something is definitely strange. Can you please send the output of
> 
>   -log_view -info :mat
> 
>   Thanks,
> 
>      Matt
>  
> Thank you very much!
> 
> Antoine C?t?
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7C6b823852b3964170f52908d876c1bb0b%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389921724846720%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w7%2Fc%2BSzAfTa02gxTS8VbB%2FVwIPpaKw%2F%2BeiiX4K9gd1k%3D&reserved=0>
> <LogView.out><mat.0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/0f8c230f/attachment.html>

From Antoine.Cote3 at USherbrooke.ca  Thu Oct 22 16:08:22 2020
From: Antoine.Cote3 at USherbrooke.ca (=?iso-8859-1?Q?Antoine_C=F4t=E9?=)
Date: Thu, 22 Oct 2020 21:08:22 +0000
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <CAMYG4Gmxx_jY9DEoMsWj_tX8F3Hs6Es_38VHtLVNv=sqhLuAgA@mail.gmail.com>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>
	<YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev>
	<YQXPR0101MB160676B1A7375D756FC3D686C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>,
	<CAMYG4Gmxx_jY9DEoMsWj_tX8F3Hs6Es_38VHtLVNv=sqhLuAgA@mail.gmail.com>
Message-ID: <YQXPR0101MB1606E1AC477B2E3E548B3E48C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>

The new outputs are attached. The same problem was run with arch-master-opt

The overall time was cut in half, but %T remain roughly the same...

Thanks,

Antoine


________________________________
De : Matthew Knepley <knepley at gmail.com>
Envoy? : 22 octobre 2020 16:28
? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca>
Cc : Barry Smith <bsmith at petsc.dev>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Objet : Re: [petsc-users] Enhancing MatScale computing time

On Thu, Oct 22, 2020 at 4:17 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca<mailto:Antoine.Cote3 at usherbrooke.ca>> wrote:
Hi Sir,

MatScale in "Main Stage" is indeed called 6 times for 0% run time. In stage "Stiff_Adj" though, we get :

MatScale            8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 50 46  0  0  0  80 98  0  0  0   482

MatMult is indeed expensive (23% run time) and should be improved, but MatScale in "Stiff_Adj" is still taking 50% run time

I was a little surprised that MatScale gets only 450 MFlops. However, it looks like you are running the debugging version of PETSc. Could you configure
a version without debugging:

  $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/reconfigure-$PETSC_ARCH.py --with-debugging=0 --PETSC_ARCH=arch-master-opt

and rerun the timings?

  Thanks,

     Matt

Thanks,

Antoine
________________________________
De : Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Envoy? : 22 octobre 2020 16:09
? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca>
Cc : petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Objet : Re: [petsc-users] Enhancing MatScale computing time


MatMult             9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 23 48  0  0  0  61 91  0  0  0  1079
MatScale               6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   467

Though the flop rate of MatScale is not so high (467) it is taking very little (0 percent of the run time while MatMult takes 23 percent of the time).

So the main cost related to the matrices is MatMult because it has a lot of operations 9553, you might think about your algorithms you are using and if there
improvements.

It looks like you are using some kind of multigrid and solve 6 problems with 1357 total iterations which is 200 iterations per solve. This is absolutely HUGE for multigrain, you need to tune the multigrid for you problem to bring that down to at most a couple dozen iterations per solve.

  Barry

On Oct 22, 2020, at 3:02 PM, Antoine C?t? <Antoine.Cote3 at USherbrooke.ca<mailto:Antoine.Cote3 at USherbrooke.ca>> wrote:

Hi,

See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof.

Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used.

Regards,

Antoine
________________________________
De : Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Envoy? : 22 octobre 2020 15:35
? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca<mailto:Antoine.Cote3 at USherbrooke.ca>>
Cc : petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Objet : Re: [petsc-users] Enhancing MatScale computing time

On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca<mailto:Antoine.Cote3 at usherbrooke.ca>> wrote:
Hi,

I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) :

MatZeroEntries(K)
In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
MatDiagonalScale(K, vec1, vec1)
MatDiagonalSet(K, vec2, ADD_VALUES)

Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks :

  *   DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4
  *   Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual
  *   Tried MatSetOption()
     *   MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency
     *   MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
     *   MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
     *   MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly"

According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue.

I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster?

Something is definitely strange. Can you please send the output of

  -log_view -info :mat

  Thanks,

     Matt

Thank you very much!

Antoine C?t?


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7Ca9b8f2119f23459a575a08d876c90fdc%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389953212081877%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qcdhMqXTU%2BvkLB7onJIBd13OLm4gvvW6PM82JBZ%2FJPI%3D&reserved=0>
<LogView.out><mat.0>


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7Ca9b8f2119f23459a575a08d876c90fdc%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389953212091872%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9%2FeWbyt3G5r974hVolPohbP5y8D%2BZ8fjqVX6%2BukWgWo%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/b4734b64/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LogView.out
Type: application/octet-stream
Size: 12210 bytes
Desc: LogView.out
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/b4734b64/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mat.0
Type: application/octet-stream
Size: 234549 bytes
Desc: mat.0
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/b4734b64/attachment-0003.obj>

From bsmith at petsc.dev  Thu Oct 22 16:42:29 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 22 Oct 2020 16:42:29 -0500
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <YQXPR0101MB1606E1AC477B2E3E548B3E48C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<CAMYG4GkofvEhNJU6=3QpfL1_oMHh108tYpnsHm--kd=0keRN9Q@mail.gmail.com>
	<YQXPR0101MB16068DA64BF703454DAE05E1C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<8A3BDD0C-2697-4453-8A71-2A900A958862@petsc.dev>
	<YQXPR0101MB160676B1A7375D756FC3D686C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
	<CAMYG4Gmxx_jY9DEoMsWj_tX8F3Hs6Es_38VHtLVNv=sqhLuAgA@mail.gmail.com>
	<YQXPR0101MB1606E1AC477B2E3E548B3E48C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
Message-ID: <173AA74E-1BCB-4D19-AC25-F0B5677F044C@petsc.dev>


  Please apply the attached patch (you just need to run make libs in the petsc director after you apply it) and see if the VecScale flop rate changes.

  Barry


> On Oct 22, 2020, at 4:08 PM, Antoine C?t? <Antoine.Cote3 at USherbrooke.ca> wrote:
> 
> The new outputs are attached. The same problem was run with arch-master-opt
> 
> The overall time was cut in half, but %T remain roughly the same...
> 
> Thanks,
> 
> Antoine
> 
> 
> De : Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> Envoy? : 22 octobre 2020 16:28
> ? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca <mailto:Antoine.Cote3 at USherbrooke.ca>>
> Cc : Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>; petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Objet : Re: [petsc-users] Enhancing MatScale computing time
>  
> On Thu, Oct 22, 2020 at 4:17 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca <mailto:Antoine.Cote3 at usherbrooke.ca>> wrote:
> Hi Sir,
> 
> MatScale in "Main Stage" is indeed called 6 times for 0% run time. In stage "Stiff_Adj" though, we get :
> 
> MatScale            8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 50 46  0  0  0  80 98  0  0  0   482
> 
> MatMult is indeed expensive (23% run time) and should be improved, but MatScale in "Stiff_Adj" is still taking 50% run time
> 
> I was a little surprised that MatScale gets only 450 MFlops. However, it looks like you are running the debugging version of PETSc. Could you configure
> a version without debugging:
> 
>   $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/reconfigure-$PETSC_ARCH.py --with-debugging=0 --PETSC_ARCH=arch-master-opt
> 
> and rerun the timings?
> 
>   Thanks,
> 
>      Matt
>  
> Thanks,
> 
> Antoine
> De : Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
> Envoy? : 22 octobre 2020 16:09
> ? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca <mailto:Antoine.Cote3 at USherbrooke.ca>>
> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Objet : Re: [petsc-users] Enhancing MatScale computing time
>  
> 
> MatMult             9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 0.0e+00 23 48  0  0  0  61 91  0  0  0  1079
> MatScale               6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   467
> 
> Though the flop rate of MatScale is not so high (467) it is taking very little (0 percent of the run time while MatMult takes 23 percent of the time).
> 
> So the main cost related to the matrices is MatMult because it has a lot of operations 9553, you might think about your algorithms you are using and if there 
> improvements. 
> 
> It looks like you are using some kind of multigrid and solve 6 problems with 1357 total iterations which is 200 iterations per solve. This is absolutely HUGE for multigrain, you need to tune the multigrid for you problem to bring that down to at most a couple dozen iterations per solve.
> 
>   Barry
> 
>> On Oct 22, 2020, at 3:02 PM, Antoine C?t? <Antoine.Cote3 at USherbrooke.ca <mailto:Antoine.Cote3 at USherbrooke.ca>> wrote:
>> 
>> Hi,
>> 
>> See attached files for both outputs. Tell me if you need any clarification. It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 dof per nodes, problem has a total of 28611 dof.
>> 
>> Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. PetscLogStagePush/Pop was used.
>> 
>> Regards,
>> 
>> Antoine
>> De : Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
>> Envoy? : 22 octobre 2020 15:35
>> ? : Antoine C?t? <Antoine.Cote3 at USherbrooke.ca <mailto:Antoine.Cote3 at USherbrooke.ca>>
>> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> Objet : Re: [petsc-users] Enhancing MatScale computing time
>>  
>> On Thu, Oct 22, 2020 at 3:23 PM Antoine C?t? <Antoine.Cote3 at usherbrooke.ca <mailto:Antoine.Cote3 at usherbrooke.ca>> wrote:
>> Hi,
>> 
>> I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse matrix Mat K. The Mat is modified repeatedly by the program, using the commands (in that order) :
>> 
>> MatZeroEntries(K)
>> In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
>> MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
>> MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
>> MatDiagonalScale(K, vec1, vec1)
>> MatDiagonalSet(K, vec2, ADD_VALUES)
>> 
>> Computing time seems high and I would like to improve it. Running tests with "-log_view" tells me that MatScale() is the bottle neck (50% of total computing time) . From manual pages, I've tried a few tweaks :
>> DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4
>> Used MatMissingDiagonal() to confirm there is no missing diagonal entries : "If the matrix Y is missing some diagonal entries this routine can be very slow", MatDiagonalSet() manual
>> Tried MatSetOption()
>> MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency
>> MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
>> MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes have one less global reduction"
>> MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix assembly"
>> According to "-log_view", assembly is fast (0% of total time), and the use of a DMDA makes me believe preallocation isn't the cause of performance issue.
>> 
>> I would like to know how could I improve MatScale(). What are the best practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? Instead of MatDiagonalScale(), should I use another command to obtain the same result faster?
>> 
>> Something is definitely strange. Can you please send the output of
>> 
>>   -log_view -info :mat
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>> Thank you very much!
>> 
>> Antoine C?t?
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7Ca9b8f2119f23459a575a08d876c90fdc%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389953212081877%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qcdhMqXTU%2BvkLB7onJIBd13OLm4gvvW6PM82JBZ%2FJPI%3D&reserved=0>
>> <LogView.out><mat.0>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7Ca9b8f2119f23459a575a08d876c90fdc%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389953212091872%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9%2FeWbyt3G5r974hVolPohbP5y8D%2BZ8fjqVX6%2BukWgWo%3D&reserved=0>
> <LogView.out><mat.0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/c79a1a93/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vecscale.patch
Type: application/octet-stream
Size: 577 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/c79a1a93/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201022/c79a1a93/attachment-0003.html>

From dave.mayhem23 at gmail.com  Fri Oct 23 02:09:47 2020
From: dave.mayhem23 at gmail.com (Dave May)
Date: Fri, 23 Oct 2020 09:09:47 +0200
Subject: [petsc-users] Enhancing MatScale computing time
In-Reply-To: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
References: <YQXPR0101MB16068DD3840BCA876369F426C71D0@YQXPR0101MB1606.CANPRD01.PROD.OUTLOOK.COM>
Message-ID: <CAJ98EDpFXPEow4By2AXmbwpf+PZxq1AY53EJ_uht80K750++mQ@mail.gmail.com>

On Thu 22. Oct 2020 at 21:23, Antoine C?t? <Antoine.Cote3 at usherbrooke.ca>
wrote:

> Hi,
>
> I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse
> matrix Mat K. The Mat is modified repeatedly by the program, using the
> commands (in that order) :
>
> MatZeroEntries(K)
> In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
> MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
> MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
> MatDiagonalScale(K, vec1, vec1)
> MatDiagonalSet(K, vec2, ADD_VALUES)
>

Why not just assemble the entire operator you seek locally in vals?
You would then avoid the calls to MatDiagonalScale and MatDiagonalSet by
instead calling VecGetArrayRead on vec1 and vec2 and using the local parts
of these vectors you need with vals. You probably need to scatter vec1,
vec2 first before VecGetArrayRead.

Thanks,
Dave


> Computing time seems high and I would like to improve it. Running tests
> with "-log_view" tells me that MatScale() is the bottle neck (50% of total
> computing time) . From manual pages, I've tried a few tweaks :
>
>    - DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees
>    of freedom per node, ... BAIJ can significantly enhance performance",
>    Chapter 14.2.4
>    - Used MatMissingDiagonal() to confirm there is no missing diagonal
>    entries : "If the matrix Y is missing some diagonal entries this routine
>    can be very slow", MatDiagonalSet() manual
>    - Tried MatSetOption()
>       - MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly
>       efficiency
>       - MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly
>       processes have one less global reduction"
>       - MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true,
>       assembly processes have one less global reduction"
>       - MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during
>       matrix assembly"
>
> According to "-log_view", assembly is fast (0% of total time), and the
> use of a DMDA makes me believe preallocation isn't the cause of performance
> issue.
>
> I would like to know how could I improve MatScale(). What are the best
> practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)?
> Instead of MatDiagonalScale(), should I use another command to obtain the
> same result faster?
>
> Thank you very much!
>
> Antoine C?t?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201023/799ff90b/attachment.html>

From m.huysegoms at fz-juelich.de  Fri Oct 23 05:05:13 2020
From: m.huysegoms at fz-juelich.de (Marcel Huysegoms)
Date: Fri, 23 Oct 2020 12:05:13 +0200
Subject: [petsc-users] MatOrdering for rectangular matrix
In-Reply-To: <9B822030-7E72-4C6A-9669-6AA82AFB0B95@petsc.dev>
References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de>
	<CAMYG4GmJT_uzAJXW10Z9BU3fGsGDrTZhRttKJa4Xts0Ow59dbQ@mail.gmail.com>
	<9B822030-7E72-4C6A-9669-6AA82AFB0B95@petsc.dev>
Message-ID: <c76825aa-3e95-2b4f-cf65-e21d0f03946b@fz-juelich.de>

Hi Barry,

many thanks for your explanation and suggestion!! I have a much better 
understanding of the problem now.

For some reason, I wasn't aware that permuting A by P leads to a 
/symmetric/ reordering of A'A.
I searched for the paper by Tim Davis that describes their reordering 
approach ("SuiteSparseQR: multifrontal mulithreaded rank-revealing 
sparse QR factorization"), and as you expected, they perform the column 
ordering of A by using a permutation matrix P which is obtained by an 
ordering of A'A. However, they are using the reordered matrix AP to 
perform a QR decomposition, not to use it for a preconditioner as I 
intend to do.

All in all, I will definitely try your suggested approach that 
SuiteSparseQR more or less also utilizes.

However, I have (more or less) _one remaining question_:

When calculating a column reordering matrix P based on A'A and applying 
this matrix to A (so having AP), then its normal equation will be 
P'(A'A)P as you pointed out. But P has originally been computed in a 
way, so that (A'A)P will be diagonally dominant, not P'(A'A)P. So won't 
the additional effect of P' (i.e. the row reordering) compromise the 
diagonal structure again?

I am using the KSP in the following way:

ksp = PETSc.KSP().create(PETSc.COMM_WORLD)
ksp.setType("lsqr")
pc = ksp.getPC()
pc.setType("bjacobi")
ksp.setOperators(A, A'A)
ksp.solve(b, x)

The paper you referenced seems very intersting to me. So I wonder, if I 
had a good /non-symmetric/ ordering of A'A, i.e. Q(A'A)P, and would pass 
this matrix to setOperators() as the second argument for the 
preconditioner (while using AP as first argument), what is happening 
internally? Does BJACOBI compute a preconditioner matrix M^(-1) for 
Q(A'A)P and passes this M^(-1) to LSQR for applying it to AP [yielding 
M^(-1)AP] before performing its iterative CG-method on this 
preconditioned system? In that case, could I perform the computation of 
M^(-1) outside of ksp.solve(), so that I could apply it myself to AP 
_and_ b (!!), so passing M^(-1)AP and M^(-1)b to ksp.setOperators() and 
ksp.solve()?

Maybe my question is due to one missing piece of mathematical 
understanding. Does the matrix for computing the preconditioning (second 
argument to setOperators()) have to be exactly the normal equation (A'A) 
of the first argument in order to mathematically make sense? I could not 
find any reference why this is done/works?

Thank you very much in advance for taking time for this topic! I really 
appreciate it.

Marcel


Am 22.10.20 um 16:34 schrieb Barry Smith:
> ? Marcel,
>
> ? ?Would you like to do the following? Compute
>
> ? ? Q A ?P where Q is a row permutation, P a column permutation and 
> then apply LSQR on QAP?
>
> ? ? From the manual page:
>
> In exact arithmetic the LSQR method (with no preconditioning) is 
> identical to the KSPCG algorithm applied to the normal equations.
>
> ? ?[Q A ?P]' [Q A ?P] = P' A' A P = P'(A'A) P ?the Q drops out because 
> ?permutation matrices' transposes are their inverse
>
> ?Note that P is a small square matrix.
>
> ? So my conclusion is that any column permutation of A is also a 
> symmetric permutation of A'A so you can just try using regular 
> reorderings of A'A if
> you want to "concentrate" the "important" parts of A'A into your 
> "block diagonal" preconditioner (and throw away the other parts)
>
> ? I don't know what it will do to the convergence. I've never had much 
> luck generically trying to symmetrically reorder matrices to improve 
> preconditioners but
> for certain situation maybe it might help. For example if the matrix 
> is ?[0 1; 1 0] and you permute it you get the [1 0; 0 1] which looks 
> better.
>
> ? There is this https://epubs.siam.org/doi/10.1137/S1064827599361308 
> but it is for non-symmetric permutations and in your case if you use a 
> non symmetric permeation you can no longer use LSQR.
>
> ? Barry
>
>
>
>
>> On Oct 22, 2020, at 4:55 AM, Matthew Knepley <knepley at gmail.com 
>> <mailto:knepley at gmail.com>> wrote:
>>
>> On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms 
>> <m.huysegoms at fz-juelich.de <mailto:m.huysegoms at fz-juelich.de>> wrote:
>>
>>     Hi all,
>>
>>     I'm currently implementing a Gauss-Newton approach for minimizing a
>>     non-linear cost function using PETSc4py.
>>     The (rectangular) linear systems I am trying to solve have
>>     dimensions of
>>     about (5N, N), where N is in the range of several hundred millions.
>>
>>     Due to its size and because it's an over-determined system, I use
>>     LSQR
>>     in conjunction with a preconditioner (which operates on A^T x A, e.g.
>>     BJacobi).
>>     Depending on the ordering of the unknowns the algorithm only
>>     converges
>>     for special cases. When I use a direct LR solver (as
>>     preconditioner) it
>>     consistently converges, but consumes too much memory. I have read
>>     in the
>>     manual that the LR solver internally also applies a matrix reordering
>>     beforehand.
>>
>>     My question would be:
>>     How can I improve the ordering of the unknowns for a rectangular
>>     matrix
>>     (in order to converge also with iterative preconditioners)? If I use
>>     MatGetOrdering(), it only works for square matrices. Is there a
>>     way to
>>     achieve this from within PETSc4py?
>>     ParMETIS seems to be a promising framework for that task. Is it
>>     possible
>>     to apply its reordering algorithm to a rectangular PETSc-matrix?
>>
>>     I would be thankful for every bit of advice that might help.
>>
>>
>> We do not have any rectangular reordering algorithms. I think your 
>> first step is to
>> find something in the literature that you think will work.
>>
>> ? Thanks,
>>
>> ? ? ?Matt
>>
>>     Best regards,
>>     Marcel
>>
>>
>>     ------------------------------------------------------------------------------------------------
>>     ------------------------------------------------------------------------------------------------
>>     Forschungszentrum Juelich GmbH
>>     52425 Juelich
>>     Sitz der Gesellschaft: Juelich
>>     Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>>     Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
>>     Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
>>     Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
>>     ------------------------------------------------------------------------------------------------
>>     ------------------------------------------------------------------------------------------------
>>
>>
>>
>> -- 
>> What most experimenters take for granted before they begin their 
>> experiments is infinitely more interesting than any results to which 
>> their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/ 
>> <http://www.cse.buffalo.edu/~knepley/>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201023/fb9e2cbb/attachment-0001.html>

From Eugenio.Aulisa at ttu.edu  Fri Oct 23 09:11:50 2020
From: Eugenio.Aulisa at ttu.edu (Aulisa, Eugenio)
Date: Fri, 23 Oct 2020 14:11:50 +0000
Subject: [petsc-users] reset the sparsity pattern of a matrix without
 destroying and recreating it
In-Reply-To: <CAMYG4G=wwz17WbE1HSUdgJf6-LVev4BV+D_=OZi5DVgvD8VFBA@mail.gmail.com>
References: <DM6PR06MB5435A5C5D56B5CC5820F050782030@DM6PR06MB5435.namprd06.prod.outlook.com>,
	<CAMYG4G=wwz17WbE1HSUdgJf6-LVev4BV+D_=OZi5DVgvD8VFBA@mail.gmail.com>
Message-ID: <DM6PR06MB54358CDF61D42EBC0A48BC97821A0@DM6PR06MB5435.namprd06.prod.outlook.com>


Hi

I have a time dependent problem, where at each iteration
the sparsity pattern of some rows of an mpi matrix keeps changing.

I have an estimate at each iteration what the maximum size of these rows should be,
so I can conservatively pre-allocate the matrix memory.

I then assembly using the option
MatSetOption(mat, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE);

It runs for few iterations but then it saturates the pre-allocated memory.

What happens is that at each iteration new columns are added to the changing rows,
but old entries that are now zero (and not needed anymore) are not removed,
and the size of the changing rows increases till it reaches the maximum allowed value.

Is there any way, when at each iteration I zero the matrix
to forget the previous sparsity pattern and start from fresh,
without destroying and recreating the matrix?

Also, if possible, is it possible to select only the rows where this should happens?
i.e. keeping the same sparsity pattern for a set of rows and forget it for the others.

Thanks,
Eugenio


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201023/2b9c3b57/attachment.html>

From bsmith at petsc.dev  Fri Oct 23 11:51:19 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 23 Oct 2020 11:51:19 -0500
Subject: [petsc-users] reset the sparsity pattern of a matrix without
 destroying and recreating it
In-Reply-To: <DM6PR06MB54358CDF61D42EBC0A48BC97821A0@DM6PR06MB5435.namprd06.prod.outlook.com>
References: <DM6PR06MB5435A5C5D56B5CC5820F050782030@DM6PR06MB5435.namprd06.prod.outlook.com>
	<CAMYG4G=wwz17WbE1HSUdgJf6-LVev4BV+D_=OZi5DVgvD8VFBA@mail.gmail.com>
	<DM6PR06MB54358CDF61D42EBC0A48BC97821A0@DM6PR06MB5435.namprd06.prod.outlook.com>
Message-ID: <64849F0D-82BB-4D2B-B671-CB2DAA285571@petsc.dev>


  You should be able to call 

MatResetPreallocation()
MatSeqAIJSetPreallocation(new data) 

or any other preallocations routines again and it will clean out the old
material.

 Let us know if this does not work,

  Barry


> On Oct 23, 2020, at 9:11 AM, Aulisa, Eugenio <Eugenio.Aulisa at ttu.edu> wrote:
> 
> 
> Hi
> 
> I have a time dependent problem, where at each iteration
> the sparsity pattern of some rows of an mpi matrix keeps changing.
> 
> I have an estimate at each iteration what the maximum size of these rows should be,
> so I can conservatively pre-allocate the matrix memory.
> 
> I then assembly using the option
> MatSetOption(mat, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE);
> 
> It runs for few iterations but then it saturates the pre-allocated memory.
> 
> What happens is that at each iteration new columns are added to the changing rows, 
> but old entries that are now zero (and not needed anymore) are not removed, 
> and the size of the changing rows increases till it reaches the maximum allowed value.
> 
> Is there any way, when at each iteration I zero the matrix 
> to forget the previous sparsity pattern and start from fresh, 
> without destroying and recreating the matrix?
> 
> Also, if possible, is it possible to select only the rows where this should happens?
> i.e. keeping the same sparsity pattern for a set of rows and forget it for the others.
> 
> Thanks,
> Eugenio

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201023/94cd6d0b/attachment.html>

From bsmith at petsc.dev  Fri Oct 23 12:02:31 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 23 Oct 2020 12:02:31 -0500
Subject: [petsc-users] MatOrdering for rectangular matrix
In-Reply-To: <c76825aa-3e95-2b4f-cf65-e21d0f03946b@fz-juelich.de>
References: <3c649ace-248f-38b9-bde9-4f0fa10bf71e@fz-juelich.de>
	<CAMYG4GmJT_uzAJXW10Z9BU3fGsGDrTZhRttKJa4Xts0Ow59dbQ@mail.gmail.com>
	<9B822030-7E72-4C6A-9669-6AA82AFB0B95@petsc.dev>
	<c76825aa-3e95-2b4f-cf65-e21d0f03946b@fz-juelich.de>
Message-ID: <FEA1FD70-7E18-4E73-A179-76527C9C3AC8@petsc.dev>


> On Oct 23, 2020, at 5:05 AM, Marcel Huysegoms <m.huysegoms at fz-juelich.de> wrote:
> 
> Hi Barry,
> 
> many thanks for your explanation and suggestion!! I have a much better understanding of the problem now.
> 
> For some reason, I wasn't aware that permuting A by P leads to a symmetric reordering of A'A.
> I searched for the paper by Tim Davis that describes their reordering approach ("SuiteSparseQR: multifrontal mulithreaded rank-revealing sparse QR factorization"), and as you expected, they perform the column ordering of A by using a permutation matrix P which is obtained by an ordering of A'A. However, they are using the reordered matrix AP to perform a QR decomposition, not to use it for a preconditioner as I intend to do.
> 
> All in all, I will definitely try your suggested approach that SuiteSparseQR more or less also utilizes.
> 
> However, I have (more or less) one remaining question:
> 
> When calculating a column reordering matrix P based on A'A and applying this matrix to A (so having AP), then its normal equation will be P'(A'A)P as you pointed out. But P has originally been computed in a way, so that (A'A)P will be diagonally dominant, not P'(A'A)P. So won't the additional effect of P' (i.e. the row reordering) compromise the diagonal structure again?

   I don't know anything about this. My feeling was that since A'A is always symmetric one would use a symmetric reordering on it, not a one sided non-symmetric reordering.

 The RCM order has a reputation for bringing off diagonal arguments closer to the diagonal.  Hence if you reorder with RCM and then use block Jacobi, in theory, there
will be "better" blocks on the diagonal then in the original ordering. I would try that first.


> 
> I am using the KSP in the following way:
> ksp = PETSc.KSP().create(PETSc.COMM_WORLD)
> ksp.setType("lsqr")
> pc = ksp.getPC()
> pc.setType("bjacobi")
> ksp.setOperators(A, A'A)
> ksp.solve(b, x)
> The paper you referenced seems very intersting to me. So I wonder, if I had a good non-symmetric ordering of A'A, i.e. Q(A'A)P, and would pass this matrix to setOperators() as the second argument for the preconditioner (while using AP as first argument), what is happening internally? Does BJACOBI compute a preconditioner matrix M^(-1) for Q(A'A)P and passes this M^(-1) to LSQR for applying it to AP [yielding M^(-1)AP] before performing its iterative CG-method on this preconditioned system? In that case, could I perform the computation of M^(-1) outside of ksp.solve(), so that I could apply it myself to AP and b (!!), so passing M^(-1)AP and M^(-1)b to ksp.setOperators() and ksp.solve()?
> 
> Maybe my question is due to one missing piece of mathematical understanding. Does the matrix for computing the preconditioning (second argument to setOperators()) have to be exactly the normal equation (A'A) of the first argument in order to mathematically make sense? I could not find any reference why this is done/works?

  No, you can pass any matrix you want as the "normal equation" matrix to LSQR because it only builds the preconditioner from it. The matrix-vector products that define the problem are passed as the other argument.  Heuristically you want something B for A'A that is "close" to A'A in some measure. The simplest thing would be just remove some terms away from the diagonal in B.  What terms to move etc is unknown to me. There are many games one can play but I don't know which ones would be good for  your problem.

  Barry

> 
> Thank you very much in advance for taking time for this topic! I really appreciate it.
> 
> Marcel
> 
> 
> Am 22.10.20 um 16:34 schrieb Barry Smith:
>>   Marcel,
>> 
>>    Would you like to do the following? Compute   
>> 
>>     Q A  P where Q is a row permutation, P a column permutation and then apply LSQR on QAP? 
>>   
>> 
>>     From the manual page: 
>> 
>> In exact arithmetic the LSQR method (with no preconditioning) is identical to the KSPCG algorithm applied to the normal equations.
>> 
>>    [Q A  P]' [Q A  P] = P' A' A P = P'(A'A) P  the Q drops out because  permutation matrices' transposes are their inverse
>> 
>>  Note that P is a small square matrix. 
>> 
>>   So my conclusion is that any column permutation of A is also a symmetric permutation of A'A so you can just try using regular reorderings of A'A if 
>> you want to "concentrate" the "important" parts of A'A into your "block diagonal" preconditioner (and throw away the other parts)
>> 
>>   I don't know what it will do to the convergence. I've never had much luck generically trying to symmetrically reorder matrices to improve preconditioners but 
>> for certain situation maybe it might help. For example if the matrix is  [0 1; 1 0] and you permute it you get the [1 0; 0 1] which looks better.
>> 
>>   There is this https://epubs.siam.org/doi/10.1137/S1064827599361308 <https://epubs.siam.org/doi/10.1137/S1064827599361308>  but it is for non-symmetric permutations and in your case if you use a non symmetric permeation you can no longer use LSQR. 
>> 
>>   Barry
>> 
>> 
>> 
>> 
>>> On Oct 22, 2020, at 4:55 AM, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>> 
>>> On Thu, Oct 22, 2020 at 4:24 AM Marcel Huysegoms <m.huysegoms at fz-juelich.de <mailto:m.huysegoms at fz-juelich.de>> wrote:
>>> Hi all,
>>> 
>>> I'm currently implementing a Gauss-Newton approach for minimizing a
>>> non-linear cost function using PETSc4py.
>>> The (rectangular) linear systems I am trying to solve have dimensions of
>>> about (5N, N), where N is in the range of several hundred millions.
>>> 
>>> Due to its size and because it's an over-determined system, I use LSQR
>>> in conjunction with a preconditioner (which operates on A^T x A, e.g.
>>> BJacobi).
>>> Depending on the ordering of the unknowns the algorithm only converges
>>> for special cases. When I use a direct LR solver (as preconditioner) it
>>> consistently converges, but consumes too much memory. I have read in the
>>> manual that the LR solver internally also applies a matrix reordering
>>> beforehand.
>>> 
>>> My question would be:
>>> How can I improve the ordering of the unknowns for a rectangular matrix
>>> (in order to converge also with iterative preconditioners)? If I use
>>> MatGetOrdering(), it only works for square matrices. Is there a way to
>>> achieve this from within PETSc4py?
>>> ParMETIS seems to be a promising framework for that task. Is it possible
>>> to apply its reordering algorithm to a rectangular PETSc-matrix?
>>> 
>>> I would be thankful for every bit of advice that might help.
>>> 
>>> We do not have any rectangular reordering algorithms. I think your first step is to
>>> find something in the literature that you think will work.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>> Best regards,
>>> Marcel
>>> 
>>> 
>>> ------------------------------------------------------------------------------------------------
>>> ------------------------------------------------------------------------------------------------
>>> Forschungszentrum Juelich GmbH
>>> 52425 Juelich
>>> Sitz der Gesellschaft: Juelich
>>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>>> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
>>> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
>>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt
>>> ------------------------------------------------------------------------------------------------
>>> ------------------------------------------------------------------------------------------------
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201023/9cc34fb9/attachment.html>

From bui at calcreek.com  Fri Oct 23 19:21:32 2020
From: bui at calcreek.com (Thuc Bui)
Date: Fri, 23 Oct 2020 17:21:32 -0700
Subject: [petsc-users] Blas undefined references when build an app linking
 to a shared library that is linked to a static Petsc library
Message-ID: <014f01d6a99b$a0b99430$e22cbc90$@calcreek.com>

Dear Petsc Users, 

 
I hope someone out there has already encountered the same linking problem
and already figured this out, or has some idea how to resolve this issue. I
have google searched but haven't found any solution.

 
I successfully ported Petsc 3.13.5 both as shared and static libraries in
both Windows 10 (Visual Studio 2015, no Fortran) and Ubuntu 20.4 (gcc, g++.
gfortran 9.3), and successfully run "make check" in all. I am also able to
build my own shared libraries linked to either shared or static Petsc
library in both platforms. I was also able to link and execute my
application to either of these shared libraries in Windows. Unfortunately, I
could only get my app to link and execute with the shared library linked to
the shared Petsc library in Ubuntu, but not with the static Petsc library.

 
On Ubuntu, below is how I build Petsc static library, which produces three
libraries: libpetsc.a, libfblas.a and libflapack.a

 
./configure CFLAGS="-fPIC" CXXFLAGS="-fPIC" FFLAGS="-fPIC" -with-cc=gcc
--with-cxx=g++ --with-fc=gfortran --with-openmp --with-debugging=0
--download-fblaslapack --with-mpi=0 --with-shared-libraries=0

 
Below is the make output on how I build my shared library, cPoissons.so
linking to the static Petsc. Please note that I have to use
--allow-multiple-definition to override the redefinition errors in lapack
and blas. I also use --whole-archive to make sure the shared library has all
the required information from lapack and blas.

 
gcc -fPIC -c -o gcc/matrixUtil.o matrixUtil.c 

                -I/home/bbwannabe/Documents/Petsc/latest/include 

 
-I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 

 
-I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include


gcc -fPIC -c -o gcc/PetscSolver.o PetscSolver.c 

                -I/home/bbwannabe/Documents/Petsc/latest/include 

 
-I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 

 
-I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include

gcc -fPIC -c -o gcc/LinearSystemSolver.o LinearSystemSolver.c 

                -I/home/bbwannabe/Documents/Petsc/latest/include 

 
-I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 

 
-I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include

gcc -fPIC -c -o gcc/cPoisson.o cPoisson.c 

                -I/home/bbwannabe/Documents/Petsc/latest/include 

 
-I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 

 
-I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include

gcc -fPIC -fopenmp -shared -o gcc/cPoissons.so gcc/matrixUtil.o
gcc/PetscSolver.o gcc/LinearSystemSolver.o gcc/cPoisson.o 

                -L/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/lib


                -Wl,--allow-multiple-definition 

                -Wl,--whole-archive -lpetsc -lflapack -lfblas
-Wl,--no-whole-archive

 
However, when I build my app linking to the above shared library
cPoissons.so. gfortran complains about undefined references, which seem to
be from blas as shown below from the output of make. Has anyone seen this
kind of linking problem before? 

 
Many thanks for your help.

Thuc Bui

Senior R&D Engineer

Calabazas Creek Research, Inc.

(650) 948-5361 (Office)

 
gfortran -fPIC -o sPoisson3D Poisson3D.f
-L/home/bbwannabe/Documents/Nemesis/cPoisson/gcc -l:cPoissons.so 

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_sgemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_zgemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_cgbmv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_cgbmv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_chemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_csymv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_sgbmv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_dgemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_zsymv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_csymv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_ssymv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_ssymv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_zgemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_dsymv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_zhemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_zgbmv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_sgemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_chemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_dgemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_cgemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_sgbmv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_zgbmv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_dgbmv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_zhemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_zsymv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_dsymv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_dgbmv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so:
undefined reference to `blas_cgemv_x_'

collect2: error: ld returned 1 exit status

make: *** [makefile:6: sPoisson3D] Error 1

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201023/8c48e6d4/attachment-0001.html>

From bsmith at petsc.dev  Fri Oct 23 19:36:36 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 23 Oct 2020 19:36:36 -0500
Subject: [petsc-users] Blas undefined references when build an app
 linking to a shared library that is linked to a static Petsc library
In-Reply-To: <014f01d6a99b$a0b99430$e22cbc90$@calcreek.com>
References: <014f01d6a99b$a0b99430$e22cbc90$@calcreek.com>
Message-ID: <4074A631-42A8-4E5B-9616-7DD94045308A@petsc.dev>


  I found

http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1730 <http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1730>

but I do not understand it.

I am pretty sure PETSc is not providing in anyway even through its packages like fblaslapack blas_sgemv_x anyway I would focus on figuring out where those are coming from. Normal compilers won't need them. Do you use them? 

  Good luck

   Barry

> On Oct 23, 2020, at 7:21 PM, Thuc Bui <bui at calcreek.com> wrote:
> 
> Dear Petsc Users, 
>  
> I hope someone out there has already encountered the same linking problem and already figured this out, or has some idea how to resolve this issue. I have google searched but haven?t found any solution.
>  
> I successfully ported Petsc 3.13.5 both as shared and static libraries in both Windows 10 (Visual Studio 2015, no Fortran) and Ubuntu 20.4 (gcc, g++. gfortran 9.3), and successfully run ?make check? in all. I am also able to build my own shared libraries linked to either shared or static Petsc library in both platforms. I was also able to link and execute my application to either of these shared libraries in Windows. Unfortunately, I could only get my app to link and execute with the shared library linked to the shared Petsc library in Ubuntu, but not with the static Petsc library.
>  
> On Ubuntu, below is how I build Petsc static library, which produces three libraries: libpetsc.a, libfblas.a and libflapack.a
>  
> ./configure CFLAGS="-fPIC" CXXFLAGS="-fPIC" FFLAGS="-fPIC" -with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openmp --with-debugging=0 --download-fblaslapack --with-mpi=0 --with-shared-libraries=0
>  
> Below is the make output on how I build my shared library, cPoissons.so linking to the static Petsc. Please note that I have to use --allow-multiple-definition to override the redefinition errors in lapack and blas. I also use --whole-archive to make sure the shared library has all the required information from lapack and blas.
>  
> gcc -fPIC -c -o gcc/matrixUtil.o matrixUtil.c 
>                 -I/home/bbwannabe/Documents/Petsc/latest/include 
>                 -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 
>                 -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include              
> gcc -fPIC -c -o gcc/PetscSolver.o PetscSolver.c 
>                 -I/home/bbwannabe/Documents/Petsc/latest/include 
>                 -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 
>                 -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include
> gcc -fPIC -c -o gcc/LinearSystemSolver.o LinearSystemSolver.c 
>                 -I/home/bbwannabe/Documents/Petsc/latest/include 
>                 -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 
>                 -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include
> gcc -fPIC -c -o gcc/cPoisson.o cPoisson.c 
>                 -I/home/bbwannabe/Documents/Petsc/latest/include 
>                 -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 
>                 -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include
> gcc -fPIC -fopenmp -shared -o gcc/cPoissons.so gcc/matrixUtil.o gcc/PetscSolver.o gcc/LinearSystemSolver.o gcc/cPoisson.o 
>                 -L/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/lib 
>                 -Wl,--allow-multiple-definition 
>                 -Wl,--whole-archive -lpetsc -lflapack -lfblas -Wl,--no-whole-archive
>  
> However, when I build my app linking to the above shared library cPoissons.so. gfortran complains about undefined references, which seem to be from blas as shown below from the output of make. Has anyone seen this kind of linking problem before? 
>  
> Many thanks for your help.
> Thuc Bui
> Senior R&D Engineer
> Calabazas Creek Research, Inc.
> (650) 948-5361 (Office)
>  
> gfortran -fPIC -o sPoisson3D Poisson3D.f -L/home/bbwannabe/Documents/Nemesis/cPoisson/gcc -l:cPoissons.so 
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv2_x_'
> /usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv_x_'
> collect2: error: ld returned 1 exit status
> make: *** [makefile:6: sPoisson3D] Error 1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201023/b085b182/attachment.html>

From bui at calcreek.com  Sat Oct 24 22:50:19 2020
From: bui at calcreek.com (Thuc Bui)
Date: Sat, 24 Oct 2020 20:50:19 -0700
Subject: [petsc-users] Blas undefined references when build an app
 linking to a shared library that is linked to a static Petsc library
In-Reply-To: <4074A631-42A8-4E5B-9616-7DD94045308A@petsc.dev>
References: <014f01d6a99b$a0b99430$e22cbc90$@calcreek.com>
	<4074A631-42A8-4E5B-9616-7DD94045308A@petsc.dev>
Message-ID: <00d601d6aa81$f5def0e0$e19cd2a0$@calcreek.com>

Hi Barry,

 
Thank you very much for getting back to me, and for the link to lapack forum. I really appreciate your taking the time to look this up. I found the same site earlier and believed the undefined references were related to xblas, but didn?t see how they were able to pollute the libfblas and libflapack libraries. Anyhow, I found a work around! Since I was able to compiled, linked and executed a window app using static Petsc with f2cblas and f2clapack built by Visual Studio 2015, I did the same for Ubuntu build, configure static Petsc without gfotran and use libf2cblas and libf2clapack. All the linking issues went away! Problem is solved!

 
Many thanks again,

Thuc

 
From: Barry Smith [mailto:bsmith at petsc.dev] 
Sent: Friday, October 23, 2020 5:37 PM
To: Thuc Bui
Cc: petsc-users
Subject: Re: [petsc-users] Blas undefined references when build an app linking to a shared library that is linked to a static Petsc library

 
  I found

 
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2 <http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1730> &t=1730

 
but I do not understand it.

 
I am pretty sure PETSc is not providing in anyway even through its packages like fblaslapack blas_sgemv_x anyway I would focus on figuring out where those are coming from. Normal compilers won't need them. Do you use them? 

 
  Good luck

 
   Barry


On Oct 23, 2020, at 7:21 PM, Thuc Bui <bui at calcreek.com> wrote:

 
Dear Petsc Users, 

 
I hope someone out there has already encountered the same linking problem and already figured this out, or has some idea how to resolve this issue. I have google searched but haven?t found any solution.

 
I successfully ported Petsc 3.13.5 both as shared and static libraries in both Windows 10 (Visual Studio 2015, no Fortran) and Ubuntu 20.4 (gcc, g++. gfortran 9.3), and successfully run ?make check? in all. I am also able to build my own shared libraries linked to either shared or static Petsc library in both platforms. I was also able to link and execute my application to either of these shared libraries in Windows. Unfortunately, I could only get my app to link and execute with the shared library linked to the shared Petsc library in Ubuntu, but not with the static Petsc library.

 
On Ubuntu, below is how I build Petsc static library, which produces three libraries: libpetsc.a, libfblas.a and libflapack.a

 
./configure CFLAGS="-fPIC" CXXFLAGS="-fPIC" FFLAGS="-fPIC" -with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-openmp --with-debugging=0 --download-fblaslapack --with-mpi=0 --with-shared-libraries=0

 
Below is the make output on how I build my shared library, cPoissons.so linking to the static Petsc. Please note that I have to use --allow-multiple-definition to override the redefinition errors in lapack and blas. I also use --whole-archive to make sure the shared library has all the required information from lapack and blas.

 
gcc -fPIC -c -o gcc/matrixUtil.o matrixUtil.c 

                -I/home/bbwannabe/Documents/Petsc/latest/include 

                -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 

                -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include              

gcc -fPIC -c -o gcc/PetscSolver.o PetscSolver.c 

                -I/home/bbwannabe/Documents/Petsc/latest/include 

                -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 

                -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include

gcc -fPIC -c -o gcc/LinearSystemSolver.o LinearSystemSolver.c 

                -I/home/bbwannabe/Documents/Petsc/latest/include 

                -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 

                -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include

gcc -fPIC -c -o gcc/cPoisson.o cPoisson.c 

                -I/home/bbwannabe/Documents/Petsc/latest/include 

                -I/home/bbwannabe/Documents/Petsc/latest/include/petsc/private 

                -I/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/include

gcc -fPIC -fopenmp -shared -o gcc/cPoissons.so gcc/matrixUtil.o gcc/PetscSolver.o gcc/LinearSystemSolver.o gcc/cPoisson.o 

                -L/home/bbwannabe/Documents/Petsc/latest/gcc-x64SRelease/lib 

                -Wl,--allow-multiple-definition 

                -Wl,--whole-archive -lpetsc -lflapack -lfblas -Wl,--no-whole-archive

 
However, when I build my app linking to the above shared library cPoissons.so. gfortran complains about undefined references, which seem to be from blas as shown below from the output of make. Has anyone seen this kind of linking problem before? 

 
Many thanks for your help.

Thuc Bui

Senior R&D Engineer

Calabazas Creek Research, Inc.

(650) 948-5361 (Office)

 
gfortran -fPIC -o sPoisson3D Poisson3D.f -L/home/bbwannabe/Documents/Nemesis/cPoisson/gcc -l:cPoissons.so 

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgbmv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_csymv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_ssymv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_chemv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_sgbmv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zgbmv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zhemv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_zsymv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dsymv_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_dgbmv2_x_'

/usr/bin/ld: /home/bbwannabe/Documents/Nemesis/cPoisson/gcc/cPoissons.so: undefined reference to `blas_cgemv_x_'

collect2: error: ld returned 1 exit status

make: *** [makefile:6: sPoisson3D] Error 1

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201024/f76b0c0e/attachment-0001.html>

From sam.guo at cd-adapco.com  Mon Oct 26 14:12:22 2020
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 26 Oct 2020 12:12:22 -0700
Subject: [petsc-users] change lib names
Message-ID: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>

Dear PETSc team,
  I like to change petsc lib name to petsc_real or petsc_complex to
distinguish real vs complex version. Simply copy of libpetsc to
libpetsc_real does not help. I need to update PETSc makefile to recompile
but I have troubles to figure out where PETSc makefile decides the lib
name.

Thanks,
Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201026/6f34f468/attachment.html>

From jed at jedbrown.org  Mon Oct 26 14:19:51 2020
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 26 Oct 2020 13:19:51 -0600
Subject: [petsc-users] change lib names
In-Reply-To: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>
References: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>
Message-ID: <875z6wpzfs.fsf@jedbrown.org>

See libpetsc_shared and the following 2-3 lines in gmakefile.

Sam Guo <sam.guo at cd-adapco.com> writes:

> Dear PETSc team,
>   I like to change petsc lib name to petsc_real or petsc_complex to
> distinguish real vs complex version. Simply copy of libpetsc to
> libpetsc_real does not help. I need to update PETSc makefile to recompile
> but I have troubles to figure out where PETSc makefile decides the lib
> name.
>
> Thanks,
> Sam

From hzhang at mcs.anl.gov  Mon Oct 26 14:25:14 2020
From: hzhang at mcs.anl.gov (Zhang, Hong)
Date: Mon, 26 Oct 2020 19:25:14 +0000
Subject: [petsc-users] change lib names
In-Reply-To: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>
References: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>
Message-ID: <DM6PR09MB560581E1EE43CC59BB67241188190@DM6PR09MB5605.namprd09.prod.outlook.com>

Sam,
You can build petsc with different PETSC_ARCH under same PETSC_DIR, e.g., define PETSC_ARCH = arch_real or arch_complex to build different petsc libraries. Simply switch to different PETSC_ARCH when you use them. See https://www.mcs.anl.gov/petsc/documentation/installation.html#compilers

Hong

________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Sam Guo <sam.guo at cd-adapco.com>
Sent: Monday, October 26, 2020 2:12 PM
To: PETSc <petsc-users at mcs.anl.gov>
Subject: [petsc-users] change lib names

Dear PETSc team,
  I like to change petsc lib name to petsc_real or petsc_complex to distinguish real vs complex version. Simply copy of libpetsc to libpetsc_real does not help. I need to update PETSc makefile to recompile but I have troubles to figure out where PETSc makefile decides the lib name.

Thanks,
Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201026/1dbacbeb/attachment.html>

From sam.guo at cd-adapco.com  Mon Oct 26 14:28:47 2020
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 26 Oct 2020 12:28:47 -0700
Subject: [petsc-users] change lib names
In-Reply-To: <DM6PR09MB560581E1EE43CC59BB67241188190@DM6PR09MB5605.namprd09.prod.outlook.com>
References: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>
	<DM6PR09MB560581E1EE43CC59BB67241188190@DM6PR09MB5605.namprd09.prod.outlook.com>
Message-ID: <CAAZdwQuZBOQtGyUHXp=Zh5xhWJxmeXx5Ybsu_jjWUPzRJn9M+g@mail.gmail.com>

Hi Zhang Hong,
   I know I can have different PETSC_ARCH but my application will
dynamically load either real or complex version on fly and I need different
lib names.

Thanks,
Sam

On Mon, Oct 26, 2020 at 12:25 PM Zhang, Hong <hzhang at mcs.anl.gov> wrote:

> Sam,
> You can build petsc with different PETSC_ARCH under same PETSC_DIR, e.g.,
> define PETSC_ARCH = arch_real or arch_complex to build different petsc
> libraries. Simply switch to different PETSC_ARCH when you use them. See
> https://www.mcs.anl.gov/petsc/documentation/installation.html#compilers
>
> Hong
>
> ------------------------------
> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Sam
> Guo <sam.guo at cd-adapco.com>
> *Sent:* Monday, October 26, 2020 2:12 PM
> *To:* PETSc <petsc-users at mcs.anl.gov>
> *Subject:* [petsc-users] change lib names
>
> Dear PETSc team,
>   I like to change petsc lib name to petsc_real or petsc_complex to
> distinguish real vs complex version. Simply copy of libpetsc to
> libpetsc_real does not help. I need to update PETSc makefile to recompile
> but I have troubles to figure out where PETSc makefile decides the lib
> name.
>
> Thanks,
> Sam
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201026/3e8cdbcc/attachment.html>

From sam.guo at cd-adapco.com  Mon Oct 26 14:29:08 2020
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 26 Oct 2020 12:29:08 -0700
Subject: [petsc-users] change lib names
In-Reply-To: <875z6wpzfs.fsf@jedbrown.org>
References: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>
	<875z6wpzfs.fsf@jedbrown.org>
Message-ID: <CAAZdwQtd61rkp6UFGuZd_YVsmsQMwePufRh-M+3Stw2orrf3bA@mail.gmail.com>

Thanks, Jed. I'll give it a try/

On Mon, Oct 26, 2020 at 12:20 PM Jed Brown <jed at jedbrown.org> wrote:

> See libpetsc_shared and the following 2-3 lines in gmakefile.
>
> Sam Guo <sam.guo at cd-adapco.com> writes:
>
> > Dear PETSc team,
> >   I like to change petsc lib name to petsc_real or petsc_complex to
> > distinguish real vs complex version. Simply copy of libpetsc to
> > libpetsc_real does not help. I need to update PETSc makefile to recompile
> > but I have troubles to figure out where PETSc makefile decides the lib
> > name.
> >
> > Thanks,
> > Sam
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201026/b8bcb909/attachment.html>

From mbuerkle at web.de  Tue Oct 27 08:46:17 2020
From: mbuerkle at web.de (Marius Buerkle)
Date: Tue, 27 Oct 2020 14:46:17 +0100
Subject: [petsc-users] superlu_dist segfault
Message-ID: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/d3d9273f/attachment-0001.html>

From hzhang at mcs.anl.gov  Tue Oct 27 09:11:15 2020
From: hzhang at mcs.anl.gov (Zhang, Hong)
Date: Tue, 27 Oct 2020 14:11:15 +0000
Subject: [petsc-users] superlu_dist segfault
In-Reply-To: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>
References: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>
Message-ID: <DM6PR09MB56054FE068A8EC635C653D2F88160@DM6PR09MB5605.namprd09.prod.outlook.com>

Marius,
It fails at the line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c
    if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex))))     ABORT("Malloc fails for lsum[].");

We do not know what it means. You may use a debugger to check the values of the variables involved.
I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone short code that reproduce the error. We can help on its investigation.
Hong

________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Marius Buerkle <mbuerkle at web.de>
Sent: Tuesday, October 27, 2020 8:46 AM
To: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: [petsc-users] superlu_dist segfault

Hi,

When using MatMatSolve with superlu_dist I get a segmentation fault:

Malloc fails for lsum[]. at line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c

The matrix size is not particular big and I am using the petsc release branch and superlu_dist is v6.3.0 I think.

Best,
Marius
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/6a75347f/attachment.html>

From sam.guo at cd-adapco.com  Tue Oct 27 12:41:02 2020
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Tue, 27 Oct 2020 10:41:02 -0700
Subject: [petsc-users] change lib names
In-Reply-To: <CAAZdwQtd61rkp6UFGuZd_YVsmsQMwePufRh-M+3Stw2orrf3bA@mail.gmail.com>
References: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>
	<875z6wpzfs.fsf@jedbrown.org>
	<CAAZdwQtd61rkp6UFGuZd_YVsmsQMwePufRh-M+3Stw2orrf3bA@mail.gmail.com>
Message-ID: <CAAZdwQstJB-OV-y3diQSZxpuEak3LuSGHUgbTyxYQ4GQg0tyMg@mail.gmail.com>

Hi Jed,
   On windows, changing those lines allows me to link petsc with my
application but failed at loading the library. I can only load the petsc
lib by using libpetsc.dll and petsc.lib (instead of libpetsc_real.dll and
petsc_real.lib).

Thanks,
Sam

On Mon, Oct 26, 2020 at 12:29 PM Sam Guo <sam.guo at cd-adapco.com> wrote:

> Thanks, Jed. I'll give it a try/
>
> On Mon, Oct 26, 2020 at 12:20 PM Jed Brown <jed at jedbrown.org> wrote:
>
>> See libpetsc_shared and the following 2-3 lines in gmakefile.
>>
>> Sam Guo <sam.guo at cd-adapco.com> writes:
>>
>> > Dear PETSc team,
>> >   I like to change petsc lib name to petsc_real or petsc_complex to
>> > distinguish real vs complex version. Simply copy of libpetsc to
>> > libpetsc_real does not help. I need to update PETSc makefile to
>> recompile
>> > but I have troubles to figure out where PETSc makefile decides the lib
>> > name.
>> >
>> > Thanks,
>> > Sam
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/5a5ff501/attachment.html>

From jed at jedbrown.org  Tue Oct 27 12:53:25 2020
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 27 Oct 2020 11:53:25 -0600
Subject: [petsc-users] change lib names
In-Reply-To: <CAAZdwQstJB-OV-y3diQSZxpuEak3LuSGHUgbTyxYQ4GQg0tyMg@mail.gmail.com>
References: <CAAZdwQs4mGT6AakR5U-wBi4TWDDEdK2s27uuD41FUfKPHKZJPw@mail.gmail.com>
	<875z6wpzfs.fsf@jedbrown.org>
	<CAAZdwQtd61rkp6UFGuZd_YVsmsQMwePufRh-M+3Stw2orrf3bA@mail.gmail.com>
	<CAAZdwQstJB-OV-y3diQSZxpuEak3LuSGHUgbTyxYQ4GQg0tyMg@mail.gmail.com>
Message-ID: <87sg9zo8ru.fsf@jedbrown.org>

I don't know details of Windows linking, but presume this is something like the library soname on Linux, which is set by libpetsc_soname.

``make V=1` should help with debugging -- you'll be able to see what is being passed to the linker.

Sam Guo <sam.guo at cd-adapco.com> writes:

> Hi Jed,
>    On windows, changing those lines allows me to link petsc with my
> application but failed at loading the library. I can only load the petsc
> lib by using libpetsc.dll and petsc.lib (instead of libpetsc_real.dll and
> petsc_real.lib).
>
> Thanks,
> Sam
>
> On Mon, Oct 26, 2020 at 12:29 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>
>> Thanks, Jed. I'll give it a try/
>>
>> On Mon, Oct 26, 2020 at 12:20 PM Jed Brown <jed at jedbrown.org> wrote:
>>
>>> See libpetsc_shared and the following 2-3 lines in gmakefile.
>>>
>>> Sam Guo <sam.guo at cd-adapco.com> writes:
>>>
>>> > Dear PETSc team,
>>> >   I like to change petsc lib name to petsc_real or petsc_complex to
>>> > distinguish real vs complex version. Simply copy of libpetsc to
>>> > libpetsc_real does not help. I need to update PETSc makefile to
>>> recompile
>>> > but I have troubles to figure out where PETSc makefile decides the lib
>>> > name.
>>> >
>>> > Thanks,
>>> > Sam
>>>
>>

From sblondel at utk.edu  Tue Oct 27 14:09:23 2020
From: sblondel at utk.edu (Blondel, Sophie)
Date: Tue, 27 Oct 2020 19:09:23 +0000
Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
Message-ID: <BL0PR02MB53953E295629AE55E03F224BDB160@BL0PR02MB5395.namprd02.prod.outlook.com>

Hi,

I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached.

I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information.

Best,

Sophie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/d8c05ea5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex40.c
Type: text/x-csrc
Size: 12871 bytes
Desc: ex40.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/d8c05ea5/attachment-0001.bin>

From knepley at gmail.com  Tue Oct 27 14:34:40 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 27 Oct 2020 15:34:40 -0400
Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
In-Reply-To: <BL0PR02MB53953E295629AE55E03F224BDB160@BL0PR02MB5395.namprd02.prod.outlook.com>
References: <BL0PR02MB53953E295629AE55E03F224BDB160@BL0PR02MB5395.namprd02.prod.outlook.com>
Message-ID: <CAMYG4Gnp-TKr-y3YLPB7wne+mDu+KdPKnMb7zb6=iUd=87SQhg@mail.gmail.com>

On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi,
>
> I am currently using TSSetEventHandler in my code to detect a random event
> where the solution vector gets modified during the event. Ideally, after
> the event happens I want the solver to use a much smaller timestep using
> TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep
> the solver doesn't use the set value. I managed to reproduce the behavior
> by modifying ex40.c as attached.
>

 I stepped through ex40, and it does indeed change the timestep to 0.001.
Can you be more specific, perhaps with monitors, about what you think is
wrong?

  Thanks,

     Matt


> I think the issue is related to the fact that the fvalue is not
> technically "approaching" 0 with a random event, it is more of a step
> function instead. Do you have any recommendation on how to implement the
> behavior I'm looking for? Let me know if I can provide additional
> information.
>
> Best,
>
> Sophie
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/09179472/attachment.html>

From sblondel at utk.edu  Tue Oct 27 15:02:06 2020
From: sblondel at utk.edu (Blondel, Sophie)
Date: Tue, 27 Oct 2020 20:02:06 +0000
Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
In-Reply-To: <CAMYG4Gnp-TKr-y3YLPB7wne+mDu+KdPKnMb7zb6=iUd=87SQhg@mail.gmail.com>
References: <BL0PR02MB53953E295629AE55E03F224BDB160@BL0PR02MB5395.namprd02.prod.outlook.com>,
	<CAMYG4Gnp-TKr-y3YLPB7wne+mDu+KdPKnMb7zb6=iUd=87SQhg@mail.gmail.com>
Message-ID: <BL0PR02MB5395EFEB67216094D58DD886DB160@BL0PR02MB5395.namprd02.prod.outlook.com>

Hi Matt,

With the ex40 I attached in my previous email here is what I get printed on screen when running "./ex40 -ts_monitor -ts_event_monitor":
0 TS dt 0.1 time 0.
1 TS dt 0.5 time 0.1
2 TS dt 0.5 time 0.6
3 TS dt 0.5 time 1.1
4 TS dt 0.5 time 1.6
5 TS dt 0.5 time 2.1
6 TS dt 0.5 time 2.6
7 TS dt 0.5 time 3.1
8 TS dt 0.5 time 3.6
9 TS dt 0.5 time 4.1
10 TS dt 0.5 time 4.6
11 TS dt 0.5 time 5.1
12 TS dt 0.5 time 5.6
13 TS dt 0.5 time 6.1
14 TS dt 0.5 time 6.6
15 TS dt 0.5 time 7.1
TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations
Ball hit the ground at t =  7.60 seconds
16 TS dt 0.5 time 7.6
17 TS dt 0.5 time 8.1
18 TS dt 0.5 time 8.6
19 TS dt 0.5 time 9.1
20 TS dt 0.5 time 9.6
21 TS dt 0.5 time 10.1
22 TS dt 0.5 time 10.6
23 TS dt 0.5 time 11.1
24 TS dt 0.5 time 11.6
25 TS dt 0.5 time 12.1
26 TS dt 0.5 time 12.6
27 TS dt 0.5 time 13.1
28 TS dt 0.5 time 13.6
29 TS dt 0.5 time 14.1
30 TS dt 0.5 time 14.6
31 TS dt 0.5 time 15.1
32 TS dt 0.5 time 15.6
33 TS dt 0.5 time 16.1
34 TS dt 0.5 time 16.6
35 TS dt 0.5 time 17.1
36 TS dt 0.5 time 17.6
37 TS dt 0.5 time 18.1
38 TS dt 0.5 time 18.6
39 TS dt 0.5 time 19.1
40 TS dt 0.5 time 19.6
41 TS dt 0.5 time 20.1
42 TS dt 0.5 time 20.6
43 TS dt 0.5 time 21.1
44 TS dt 0.5 time 21.6
45 TS dt 0.5 time 22.1
46 TS dt 0.5 time 22.6
47 TS dt 0.5 time 23.1
48 TS dt 0.5 time 23.6
49 TS dt 0.5 time 24.1
50 TS dt 0.5 time 24.6
51 TS dt 0.5 time 25.1
TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations
Ball hit the ground at t = 25.60 seconds
52 TS dt 0.5 time 25.6
53 TS dt 0.5 time 26.1
54 TS dt 0.5 time 26.6
55 TS dt 0.5 time 27.1
56 TS dt 0.5 time 27.6
57 TS dt 0.5 time 28.1
58 TS dt 0.5 time 28.6
59 TS dt 0.5 time 29.1
60 TS dt 0.5 time 29.6
61 TS dt 0.5 time 30.1
0 TS dt 0.1 time 0.
1 TS dt 0.5 time 0.1
2 TS dt 0.5 time 0.6
3 TS dt 0.5 time 1.1
4 TS dt 0.5 time 1.6
5 TS dt 0.5 time 2.1
6 TS dt 0.5 time 2.6
7 TS dt 0.5 time 3.1
8 TS dt 0.5 time 3.6
9 TS dt 0.5 time 4.1
10 TS dt 0.5 time 4.6
11 TS dt 0.5 time 5.1
12 TS dt 0.5 time 5.6
13 TS dt 0.5 time 6.1
14 TS dt 0.5 time 6.6
15 TS dt 0.5 time 7.1
16 TS dt 0.5 time 7.6
17 TS dt 0.5 time 8.1
18 TS dt 0.5 time 8.6
19 TS dt 0.5 time 9.1
20 TS dt 0.5 time 9.6
21 TS dt 0.5 time 10.1
22 TS dt 0.5 time 10.6
23 TS dt 0.5 time 11.1
24 TS dt 0.5 time 11.6
25 TS dt 0.5 time 12.1
26 TS dt 0.5 time 12.6
TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations
Ball hit the ground at t = 13.10 seconds
27 TS dt 0.5 time 13.1
28 TS dt 0.5 time 13.6
29 TS dt 0.5 time 14.1
30 TS dt 0.5 time 14.6
31 TS dt 0.5 time 15.1
32 TS dt 0.5 time 15.6
33 TS dt 0.5 time 16.1
34 TS dt 0.5 time 16.6
35 TS dt 0.5 time 17.1
36 TS dt 0.5 time 17.6
37 TS dt 0.5 time 18.1
38 TS dt 0.5 time 18.6
39 TS dt 0.5 time 19.1
40 TS dt 0.5 time 19.6
41 TS dt 0.5 time 20.1
42 TS dt 0.5 time 20.6
43 TS dt 0.5 time 21.1
44 TS dt 0.5 time 21.6
45 TS dt 0.5 time 22.1
46 TS dt 0.5 time 22.6
47 TS dt 0.5 time 23.1
TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations
Ball hit the ground at t = 23.60 seconds
48 TS dt 0.5 time 23.6
49 TS dt 0.5 time 24.1
50 TS dt 0.5 time 24.6
51 TS dt 0.5 time 25.1
52 TS dt 0.5 time 25.6
53 TS dt 0.5 time 26.1
TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations
Ball hit the ground at t = 26.60 seconds
54 TS dt 0.5 time 26.6
55 TS dt 0.5 time 27.1
56 TS dt 0.5 time 27.6
57 TS dt 0.5 time 28.1
58 TS dt 0.5 time 28.6
59 TS dt 0.5 time 29.1
60 TS dt 0.5 time 29.6
61 TS dt 0. time 30.1

I don't see the 0.001 timestep here, do you get a different behavior?

Thank you,

Sophie
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Tuesday, October 27, 2020 15:34
To: Blondel, Sophie <sblondel at utk.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>
Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep


[External Email]

On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi,

I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached.

 I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong?

  Thanks,

     Matt

I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information.

Best,

Sophie


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/c1cb2c92/attachment.html>

From bsmith at petsc.dev  Tue Oct 27 15:24:36 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 27 Oct 2020 15:24:36 -0500
Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
In-Reply-To: <BL0PR02MB5395EFEB67216094D58DD886DB160@BL0PR02MB5395.namprd02.prod.outlook.com>
References: <BL0PR02MB53953E295629AE55E03F224BDB160@BL0PR02MB5395.namprd02.prod.outlook.com>
	<CAMYG4Gnp-TKr-y3YLPB7wne+mDu+KdPKnMb7zb6=iUd=87SQhg@mail.gmail.com>
	<BL0PR02MB5395EFEB67216094D58DD886DB160@BL0PR02MB5395.namprd02.prod.outlook.com>
Message-ID: <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev>


  I'm sorry the code is still fundamentally broken, I know I promised a long time ago to fix it all up but it is actually pretty hard to get right.

  It detects the zero by finding a small value when it should detect it by find a small region where it changes sign but surprising it is so hardwired
to the size test that fixing it and testing the new code has been very difficult to me.  My branch is barry/2019-08-18/fix-tsevent-posteventdt

  Barry


> On Oct 27, 2020, at 3:02 PM, Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hi Matt,
> 
> With the ex40 I attached in my previous email here is what I get printed on screen when running "./ex40 -ts_monitor -ts_event_monitor":
> 0 TS dt 0.1 time 0.
> 1 TS dt 0.5 time 0.1
> 2 TS dt 0.5 time 0.6
> 3 TS dt 0.5 time 1.1
> 4 TS dt 0.5 time 1.6
> 5 TS dt 0.5 time 2.1
> 6 TS dt 0.5 time 2.6
> 7 TS dt 0.5 time 3.1
> 8 TS dt 0.5 time 3.6
> 9 TS dt 0.5 time 4.1
> 10 TS dt 0.5 time 4.6
> 11 TS dt 0.5 time 5.1
> 12 TS dt 0.5 time 5.6
> 13 TS dt 0.5 time 6.1
> 14 TS dt 0.5 time 6.6
> 15 TS dt 0.5 time 7.1
> TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations
> Ball hit the ground at t =  7.60 seconds
> 16 TS dt 0.5 time 7.6
> 17 TS dt 0.5 time 8.1
> 18 TS dt 0.5 time 8.6
> 19 TS dt 0.5 time 9.1
> 20 TS dt 0.5 time 9.6
> 21 TS dt 0.5 time 10.1
> 22 TS dt 0.5 time 10.6
> 23 TS dt 0.5 time 11.1
> 24 TS dt 0.5 time 11.6
> 25 TS dt 0.5 time 12.1
> 26 TS dt 0.5 time 12.6
> 27 TS dt 0.5 time 13.1
> 28 TS dt 0.5 time 13.6
> 29 TS dt 0.5 time 14.1
> 30 TS dt 0.5 time 14.6
> 31 TS dt 0.5 time 15.1
> 32 TS dt 0.5 time 15.6
> 33 TS dt 0.5 time 16.1
> 34 TS dt 0.5 time 16.6
> 35 TS dt 0.5 time 17.1
> 36 TS dt 0.5 time 17.6
> 37 TS dt 0.5 time 18.1
> 38 TS dt 0.5 time 18.6
> 39 TS dt 0.5 time 19.1
> 40 TS dt 0.5 time 19.6
> 41 TS dt 0.5 time 20.1
> 42 TS dt 0.5 time 20.6
> 43 TS dt 0.5 time 21.1
> 44 TS dt 0.5 time 21.6
> 45 TS dt 0.5 time 22.1
> 46 TS dt 0.5 time 22.6
> 47 TS dt 0.5 time 23.1
> 48 TS dt 0.5 time 23.6
> 49 TS dt 0.5 time 24.1
> 50 TS dt 0.5 time 24.6
> 51 TS dt 0.5 time 25.1
> TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations
> Ball hit the ground at t = 25.60 seconds
> 52 TS dt 0.5 time 25.6
> 53 TS dt 0.5 time 26.1
> 54 TS dt 0.5 time 26.6
> 55 TS dt 0.5 time 27.1
> 56 TS dt 0.5 time 27.6
> 57 TS dt 0.5 time 28.1
> 58 TS dt 0.5 time 28.6
> 59 TS dt 0.5 time 29.1
> 60 TS dt 0.5 time 29.6
> 61 TS dt 0.5 time 30.1
> 0 TS dt 0.1 time 0.
> 1 TS dt 0.5 time 0.1
> 2 TS dt 0.5 time 0.6
> 3 TS dt 0.5 time 1.1
> 4 TS dt 0.5 time 1.6
> 5 TS dt 0.5 time 2.1
> 6 TS dt 0.5 time 2.6
> 7 TS dt 0.5 time 3.1
> 8 TS dt 0.5 time 3.6
> 9 TS dt 0.5 time 4.1
> 10 TS dt 0.5 time 4.6
> 11 TS dt 0.5 time 5.1
> 12 TS dt 0.5 time 5.6
> 13 TS dt 0.5 time 6.1
> 14 TS dt 0.5 time 6.6
> 15 TS dt 0.5 time 7.1
> 16 TS dt 0.5 time 7.6
> 17 TS dt 0.5 time 8.1
> 18 TS dt 0.5 time 8.6
> 19 TS dt 0.5 time 9.1
> 20 TS dt 0.5 time 9.6
> 21 TS dt 0.5 time 10.1
> 22 TS dt 0.5 time 10.6
> 23 TS dt 0.5 time 11.1
> 24 TS dt 0.5 time 11.6
> 25 TS dt 0.5 time 12.1
> 26 TS dt 0.5 time 12.6
> TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations
> Ball hit the ground at t = 13.10 seconds
> 27 TS dt 0.5 time 13.1
> 28 TS dt 0.5 time 13.6
> 29 TS dt 0.5 time 14.1
> 30 TS dt 0.5 time 14.6
> 31 TS dt 0.5 time 15.1
> 32 TS dt 0.5 time 15.6
> 33 TS dt 0.5 time 16.1
> 34 TS dt 0.5 time 16.6
> 35 TS dt 0.5 time 17.1
> 36 TS dt 0.5 time 17.6
> 37 TS dt 0.5 time 18.1
> 38 TS dt 0.5 time 18.6
> 39 TS dt 0.5 time 19.1
> 40 TS dt 0.5 time 19.6
> 41 TS dt 0.5 time 20.1
> 42 TS dt 0.5 time 20.6
> 43 TS dt 0.5 time 21.1
> 44 TS dt 0.5 time 21.6
> 45 TS dt 0.5 time 22.1
> 46 TS dt 0.5 time 22.6
> 47 TS dt 0.5 time 23.1
> TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations
> Ball hit the ground at t = 23.60 seconds
> 48 TS dt 0.5 time 23.6
> 49 TS dt 0.5 time 24.1
> 50 TS dt 0.5 time 24.6
> 51 TS dt 0.5 time 25.1
> 52 TS dt 0.5 time 25.6
> 53 TS dt 0.5 time 26.1
> TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations
> Ball hit the ground at t = 26.60 seconds
> 54 TS dt 0.5 time 26.6
> 55 TS dt 0.5 time 27.1
> 56 TS dt 0.5 time 27.6
> 57 TS dt 0.5 time 28.1
> 58 TS dt 0.5 time 28.6
> 59 TS dt 0.5 time 29.1
> 60 TS dt 0.5 time 29.6
> 61 TS dt 0. time 30.1
> 
> I don't see the 0.001 timestep here, do you get a different behavior?
> 
> Thank you,
> 
> Sophie
> From: Matthew Knepley <knepley at gmail.com>
> Sent: Tuesday, October 27, 2020 15:34
> To: Blondel, Sophie <sblondel at utk.edu>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>
> Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
>  
> [External Email]
> On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
> Hi,
> 
> I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached.
> 
>  I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong?
> 
>   Thanks,
> 
>      Matt
>  
> I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information.
> 
> Best,
> 
> Sophie
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/aed83937/attachment-0001.html>

From sblondel at utk.edu  Tue Oct 27 15:35:11 2020
From: sblondel at utk.edu (Blondel, Sophie)
Date: Tue, 27 Oct 2020 20:35:11 +0000
Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
In-Reply-To: <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev>
References: <BL0PR02MB53953E295629AE55E03F224BDB160@BL0PR02MB5395.namprd02.prod.outlook.com>
	<CAMYG4Gnp-TKr-y3YLPB7wne+mDu+KdPKnMb7zb6=iUd=87SQhg@mail.gmail.com>
	<BL0PR02MB5395EFEB67216094D58DD886DB160@BL0PR02MB5395.namprd02.prod.outlook.com>,
	<52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev>
Message-ID: <BL0PR02MB5395E0BA367DF2473BDF172EDB160@BL0PR02MB5395.namprd02.prod.outlook.com>

Hi Barry,

The code had a different behavior at one point (using the initial timestep after an event) : I still use commit f0e947c45e099a328e78b13737aa9bc4c143ca79 when I really need the time step to get really small after an event. I don't know if it can help with the current code.

Best,

Sophie
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Tuesday, October 27, 2020 16:24
To: Blondel, Sophie <sblondel at utk.edu>
Cc: Matthew Knepley <knepley at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>
Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep


  I'm sorry the code is still fundamentally broken, I know I promised a long time ago to fix it all up but it is actually pretty hard to get right.

  It detects the zero by finding a small value when it should detect it by find a small region where it changes sign but surprising it is so hardwired
to the size test that fixing it and testing the new code has been very difficult to me.  My branch is barry/2019-08-18/fix-tsevent-posteventdt

  Barry


On Oct 27, 2020, at 3:02 PM, Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hi Matt,

With the ex40 I attached in my previous email here is what I get printed on screen when running "./ex40 -ts_monitor -ts_event_monitor":
0 TS dt 0.1 time 0.
1 TS dt 0.5 time 0.1
2 TS dt 0.5 time 0.6
3 TS dt 0.5 time 1.1
4 TS dt 0.5 time 1.6
5 TS dt 0.5 time 2.1
6 TS dt 0.5 time 2.6
7 TS dt 0.5 time 3.1
8 TS dt 0.5 time 3.6
9 TS dt 0.5 time 4.1
10 TS dt 0.5 time 4.6
11 TS dt 0.5 time 5.1
12 TS dt 0.5 time 5.6
13 TS dt 0.5 time 6.1
14 TS dt 0.5 time 6.6
15 TS dt 0.5 time 7.1
TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations
Ball hit the ground at t =  7.60 seconds
16 TS dt 0.5 time 7.6
17 TS dt 0.5 time 8.1
18 TS dt 0.5 time 8.6
19 TS dt 0.5 time 9.1
20 TS dt 0.5 time 9.6
21 TS dt 0.5 time 10.1
22 TS dt 0.5 time 10.6
23 TS dt 0.5 time 11.1
24 TS dt 0.5 time 11.6
25 TS dt 0.5 time 12.1
26 TS dt 0.5 time 12.6
27 TS dt 0.5 time 13.1
28 TS dt 0.5 time 13.6
29 TS dt 0.5 time 14.1
30 TS dt 0.5 time 14.6
31 TS dt 0.5 time 15.1
32 TS dt 0.5 time 15.6
33 TS dt 0.5 time 16.1
34 TS dt 0.5 time 16.6
35 TS dt 0.5 time 17.1
36 TS dt 0.5 time 17.6
37 TS dt 0.5 time 18.1
38 TS dt 0.5 time 18.6
39 TS dt 0.5 time 19.1
40 TS dt 0.5 time 19.6
41 TS dt 0.5 time 20.1
42 TS dt 0.5 time 20.6
43 TS dt 0.5 time 21.1
44 TS dt 0.5 time 21.6
45 TS dt 0.5 time 22.1
46 TS dt 0.5 time 22.6
47 TS dt 0.5 time 23.1
48 TS dt 0.5 time 23.6
49 TS dt 0.5 time 24.1
50 TS dt 0.5 time 24.6
51 TS dt 0.5 time 25.1
TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations
Ball hit the ground at t = 25.60 seconds
52 TS dt 0.5 time 25.6
53 TS dt 0.5 time 26.1
54 TS dt 0.5 time 26.6
55 TS dt 0.5 time 27.1
56 TS dt 0.5 time 27.6
57 TS dt 0.5 time 28.1
58 TS dt 0.5 time 28.6
59 TS dt 0.5 time 29.1
60 TS dt 0.5 time 29.6
61 TS dt 0.5 time 30.1
0 TS dt 0.1 time 0.
1 TS dt 0.5 time 0.1
2 TS dt 0.5 time 0.6
3 TS dt 0.5 time 1.1
4 TS dt 0.5 time 1.6
5 TS dt 0.5 time 2.1
6 TS dt 0.5 time 2.6
7 TS dt 0.5 time 3.1
8 TS dt 0.5 time 3.6
9 TS dt 0.5 time 4.1
10 TS dt 0.5 time 4.6
11 TS dt 0.5 time 5.1
12 TS dt 0.5 time 5.6
13 TS dt 0.5 time 6.1
14 TS dt 0.5 time 6.6
15 TS dt 0.5 time 7.1
16 TS dt 0.5 time 7.6
17 TS dt 0.5 time 8.1
18 TS dt 0.5 time 8.6
19 TS dt 0.5 time 9.1
20 TS dt 0.5 time 9.6
21 TS dt 0.5 time 10.1
22 TS dt 0.5 time 10.6
23 TS dt 0.5 time 11.1
24 TS dt 0.5 time 11.6
25 TS dt 0.5 time 12.1
26 TS dt 0.5 time 12.6
TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations
Ball hit the ground at t = 13.10 seconds
27 TS dt 0.5 time 13.1
28 TS dt 0.5 time 13.6
29 TS dt 0.5 time 14.1
30 TS dt 0.5 time 14.6
31 TS dt 0.5 time 15.1
32 TS dt 0.5 time 15.6
33 TS dt 0.5 time 16.1
34 TS dt 0.5 time 16.6
35 TS dt 0.5 time 17.1
36 TS dt 0.5 time 17.6
37 TS dt 0.5 time 18.1
38 TS dt 0.5 time 18.6
39 TS dt 0.5 time 19.1
40 TS dt 0.5 time 19.6
41 TS dt 0.5 time 20.1
42 TS dt 0.5 time 20.6
43 TS dt 0.5 time 21.1
44 TS dt 0.5 time 21.6
45 TS dt 0.5 time 22.1
46 TS dt 0.5 time 22.6
47 TS dt 0.5 time 23.1
TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations
Ball hit the ground at t = 23.60 seconds
48 TS dt 0.5 time 23.6
49 TS dt 0.5 time 24.1
50 TS dt 0.5 time 24.6
51 TS dt 0.5 time 25.1
52 TS dt 0.5 time 25.6
53 TS dt 0.5 time 26.1
TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations
Ball hit the ground at t = 26.60 seconds
54 TS dt 0.5 time 26.6
55 TS dt 0.5 time 27.1
56 TS dt 0.5 time 27.6
57 TS dt 0.5 time 28.1
58 TS dt 0.5 time 28.6
59 TS dt 0.5 time 29.1
60 TS dt 0.5 time 29.6
61 TS dt 0. time 30.1

I don't see the 0.001 timestep here, do you get a different behavior?

Thank you,

Sophie
________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: Tuesday, October 27, 2020 15:34
To: Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>
Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep


[External Email]

On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi,

I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached.

 I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong?

  Thanks,

     Matt

I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information.

Best,

Sophie


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/f3e22ea4/attachment.html>

From knepley at gmail.com  Tue Oct 27 15:41:15 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 27 Oct 2020 16:41:15 -0400
Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
In-Reply-To: <52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev>
References: <BL0PR02MB53953E295629AE55E03F224BDB160@BL0PR02MB5395.namprd02.prod.outlook.com>
	<CAMYG4Gnp-TKr-y3YLPB7wne+mDu+KdPKnMb7zb6=iUd=87SQhg@mail.gmail.com>
	<BL0PR02MB5395EFEB67216094D58DD886DB160@BL0PR02MB5395.namprd02.prod.outlook.com>
	<52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev>
Message-ID: <CAMYG4GmoJOi8NPOJ=spFXNRG-bJouJ1PgCJCJAqZUMgURvYmkg@mail.gmail.com>

On Tue, Oct 27, 2020 at 4:24 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   I'm sorry the code is still fundamentally broken, I know I promised a
> long time ago to fix it all up but it is actually pretty hard to get right.
>
>   It detects the zero by finding a small value when it should detect it by
> find a small region where it changes sign but surprising it is so hardwired
> to the size test that fixing it and testing the new code has been very
> difficult to me.  My branch is barry/2019-08-18/fix-tsevent-posteventdt
>

Barry, I do not see this branch on gitlab. Can you give a URL?

  Thanks,

     Matt


>   Barry
>
>
>
> On Oct 27, 2020, at 3:02 PM, Blondel, Sophie via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi Matt,
>
> With the ex40 I attached in my previous email here is what I get printed
> on screen when running "./ex40 -ts_monitor -ts_event_monitor":
> 0 TS dt 0.1 time 0.
> 1 TS dt 0.5 time 0.1
> 2 TS dt 0.5 time 0.6
> 3 TS dt 0.5 time 1.1
> 4 TS dt 0.5 time 1.6
> 5 TS dt 0.5 time 2.1
> 6 TS dt 0.5 time 2.6
> 7 TS dt 0.5 time 3.1
> 8 TS dt 0.5 time 3.6
> 9 TS dt 0.5 time 4.1
> 10 TS dt 0.5 time 4.6
> 11 TS dt 0.5 time 5.1
> 12 TS dt 0.5 time 5.6
> 13 TS dt 0.5 time 6.1
> 14 TS dt 0.5 time 6.6
> 15 TS dt 0.5 time 7.1
> TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations
> Ball hit the ground at t =  7.60 seconds
> 16 TS dt 0.5 time 7.6
> 17 TS dt 0.5 time 8.1
> 18 TS dt 0.5 time 8.6
> 19 TS dt 0.5 time 9.1
> 20 TS dt 0.5 time 9.6
> 21 TS dt 0.5 time 10.1
> 22 TS dt 0.5 time 10.6
> 23 TS dt 0.5 time 11.1
> 24 TS dt 0.5 time 11.6
> 25 TS dt 0.5 time 12.1
> 26 TS dt 0.5 time 12.6
> 27 TS dt 0.5 time 13.1
> 28 TS dt 0.5 time 13.6
> 29 TS dt 0.5 time 14.1
> 30 TS dt 0.5 time 14.6
> 31 TS dt 0.5 time 15.1
> 32 TS dt 0.5 time 15.6
> 33 TS dt 0.5 time 16.1
> 34 TS dt 0.5 time 16.6
> 35 TS dt 0.5 time 17.1
> 36 TS dt 0.5 time 17.6
> 37 TS dt 0.5 time 18.1
> 38 TS dt 0.5 time 18.6
> 39 TS dt 0.5 time 19.1
> 40 TS dt 0.5 time 19.6
> 41 TS dt 0.5 time 20.1
> 42 TS dt 0.5 time 20.6
> 43 TS dt 0.5 time 21.1
> 44 TS dt 0.5 time 21.6
> 45 TS dt 0.5 time 22.1
> 46 TS dt 0.5 time 22.6
> 47 TS dt 0.5 time 23.1
> 48 TS dt 0.5 time 23.6
> 49 TS dt 0.5 time 24.1
> 50 TS dt 0.5 time 24.6
> 51 TS dt 0.5 time 25.1
> TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations
> Ball hit the ground at t = 25.60 seconds
> 52 TS dt 0.5 time 25.6
> 53 TS dt 0.5 time 26.1
> 54 TS dt 0.5 time 26.6
> 55 TS dt 0.5 time 27.1
> 56 TS dt 0.5 time 27.6
> 57 TS dt 0.5 time 28.1
> 58 TS dt 0.5 time 28.6
> 59 TS dt 0.5 time 29.1
> 60 TS dt 0.5 time 29.6
> 61 TS dt 0.5 time 30.1
> 0 TS dt 0.1 time 0.
> 1 TS dt 0.5 time 0.1
> 2 TS dt 0.5 time 0.6
> 3 TS dt 0.5 time 1.1
> 4 TS dt 0.5 time 1.6
> 5 TS dt 0.5 time 2.1
> 6 TS dt 0.5 time 2.6
> 7 TS dt 0.5 time 3.1
> 8 TS dt 0.5 time 3.6
> 9 TS dt 0.5 time 4.1
> 10 TS dt 0.5 time 4.6
> 11 TS dt 0.5 time 5.1
> 12 TS dt 0.5 time 5.6
> 13 TS dt 0.5 time 6.1
> 14 TS dt 0.5 time 6.6
> 15 TS dt 0.5 time 7.1
> 16 TS dt 0.5 time 7.6
> 17 TS dt 0.5 time 8.1
> 18 TS dt 0.5 time 8.6
> 19 TS dt 0.5 time 9.1
> 20 TS dt 0.5 time 9.6
> 21 TS dt 0.5 time 10.1
> 22 TS dt 0.5 time 10.6
> 23 TS dt 0.5 time 11.1
> 24 TS dt 0.5 time 11.6
> 25 TS dt 0.5 time 12.1
> 26 TS dt 0.5 time 12.6
> TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations
> Ball hit the ground at t = 13.10 seconds
> 27 TS dt 0.5 time 13.1
> 28 TS dt 0.5 time 13.6
> 29 TS dt 0.5 time 14.1
> 30 TS dt 0.5 time 14.6
> 31 TS dt 0.5 time 15.1
> 32 TS dt 0.5 time 15.6
> 33 TS dt 0.5 time 16.1
> 34 TS dt 0.5 time 16.6
> 35 TS dt 0.5 time 17.1
> 36 TS dt 0.5 time 17.6
> 37 TS dt 0.5 time 18.1
> 38 TS dt 0.5 time 18.6
> 39 TS dt 0.5 time 19.1
> 40 TS dt 0.5 time 19.6
> 41 TS dt 0.5 time 20.1
> 42 TS dt 0.5 time 20.6
> 43 TS dt 0.5 time 21.1
> 44 TS dt 0.5 time 21.6
> 45 TS dt 0.5 time 22.1
> 46 TS dt 0.5 time 22.6
> 47 TS dt 0.5 time 23.1
> TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations
> Ball hit the ground at t = 23.60 seconds
> 48 TS dt 0.5 time 23.6
> 49 TS dt 0.5 time 24.1
> 50 TS dt 0.5 time 24.6
> 51 TS dt 0.5 time 25.1
> 52 TS dt 0.5 time 25.6
> 53 TS dt 0.5 time 26.1
> TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations
> Ball hit the ground at t = 26.60 seconds
> 54 TS dt 0.5 time 26.6
> 55 TS dt 0.5 time 27.1
> 56 TS dt 0.5 time 27.6
> 57 TS dt 0.5 time 28.1
> 58 TS dt 0.5 time 28.6
> 59 TS dt 0.5 time 29.1
> 60 TS dt 0.5 time 29.6
> 61 TS dt 0. time 30.1
>
> I don't see the 0.001 timestep here, do you get a different behavior?
>
> Thank you,
>
> Sophie
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Tuesday, October 27, 2020 15:34
> *To:* Blondel, Sophie <sblondel at utk.edu>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Subject:* Re: [petsc-users] TSSetEventHandler and
> TSSetPostEventIntervalStep
>
>
> *[External Email]*
> On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi,
>
> I am currently using TSSetEventHandler in my code to detect a random event
> where the solution vector gets modified during the event. Ideally, after
> the event happens I want the solver to use a much smaller timestep using
> TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep
> the solver doesn't use the set value. I managed to reproduce the behavior
> by modifying ex40.c as attached.
>
>
>  I stepped through ex40, and it does indeed change the timestep to 0.001.
> Can you be more specific, perhaps with monitors, about what you think is
> wrong?
>
>   Thanks,
>
>      Matt
>
>
> I think the issue is related to the fact that the fvalue is not
> technically "approaching" 0 with a random event, it is more of a step
> function instead. Do you have any recommendation on how to implement the
> behavior I'm looking for? Let me know if I can provide additional
> information.
>
> Best,
>
> Sophie
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/e4c75d5e/attachment-0001.html>

From bsmith at petsc.dev  Tue Oct 27 16:01:29 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 27 Oct 2020 16:01:29 -0500
Subject: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
In-Reply-To: <CAMYG4GmoJOi8NPOJ=spFXNRG-bJouJ1PgCJCJAqZUMgURvYmkg@mail.gmail.com>
References: <BL0PR02MB53953E295629AE55E03F224BDB160@BL0PR02MB5395.namprd02.prod.outlook.com>
	<CAMYG4Gnp-TKr-y3YLPB7wne+mDu+KdPKnMb7zb6=iUd=87SQhg@mail.gmail.com>
	<BL0PR02MB5395EFEB67216094D58DD886DB160@BL0PR02MB5395.namprd02.prod.outlook.com>
	<52CD65C8-DB69-4799-ACC7-0B2E5C32FE54@petsc.dev>
	<CAMYG4GmoJOi8NPOJ=spFXNRG-bJouJ1PgCJCJAqZUMgURvYmkg@mail.gmail.com>
Message-ID: <FB48728D-B3F1-4E5E-A45C-031BD32FF5C4@petsc.dev>


  Pushed

> On Oct 27, 2020, at 3:41 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Tue, Oct 27, 2020 at 4:24 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
>   I'm sorry the code is still fundamentally broken, I know I promised a long time ago to fix it all up but it is actually pretty hard to get right.
> 
>   It detects the zero by finding a small value when it should detect it by find a small region where it changes sign but surprising it is so hardwired
> to the size test that fixing it and testing the new code has been very difficult to me.  My branch is barry/2019-08-18/fix-tsevent-posteventdt
> 
> Barry, I do not see this branch on gitlab. Can you give a URL?
> 
>   Thanks,
> 
>      Matt
>  
>   Barry
> 
> 
> 
>> On Oct 27, 2020, at 3:02 PM, Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>> 
>> Hi Matt,
>> 
>> With the ex40 I attached in my previous email here is what I get printed on screen when running "./ex40 -ts_monitor -ts_event_monitor":
>> 0 TS dt 0.1 time 0.
>> 1 TS dt 0.5 time 0.1
>> 2 TS dt 0.5 time 0.6
>> 3 TS dt 0.5 time 1.1
>> 4 TS dt 0.5 time 1.6
>> 5 TS dt 0.5 time 2.1
>> 6 TS dt 0.5 time 2.6
>> 7 TS dt 0.5 time 3.1
>> 8 TS dt 0.5 time 3.6
>> 9 TS dt 0.5 time 4.1
>> 10 TS dt 0.5 time 4.6
>> 11 TS dt 0.5 time 5.1
>> 12 TS dt 0.5 time 5.6
>> 13 TS dt 0.5 time 6.1
>> 14 TS dt 0.5 time 6.6
>> 15 TS dt 0.5 time 7.1
>> TSEvent: Event 0 zero crossing at time 7.6 located in 0 iterations
>> Ball hit the ground at t =  7.60 seconds
>> 16 TS dt 0.5 time 7.6
>> 17 TS dt 0.5 time 8.1
>> 18 TS dt 0.5 time 8.6
>> 19 TS dt 0.5 time 9.1
>> 20 TS dt 0.5 time 9.6
>> 21 TS dt 0.5 time 10.1
>> 22 TS dt 0.5 time 10.6
>> 23 TS dt 0.5 time 11.1
>> 24 TS dt 0.5 time 11.6
>> 25 TS dt 0.5 time 12.1
>> 26 TS dt 0.5 time 12.6
>> 27 TS dt 0.5 time 13.1
>> 28 TS dt 0.5 time 13.6
>> 29 TS dt 0.5 time 14.1
>> 30 TS dt 0.5 time 14.6
>> 31 TS dt 0.5 time 15.1
>> 32 TS dt 0.5 time 15.6
>> 33 TS dt 0.5 time 16.1
>> 34 TS dt 0.5 time 16.6
>> 35 TS dt 0.5 time 17.1
>> 36 TS dt 0.5 time 17.6
>> 37 TS dt 0.5 time 18.1
>> 38 TS dt 0.5 time 18.6
>> 39 TS dt 0.5 time 19.1
>> 40 TS dt 0.5 time 19.6
>> 41 TS dt 0.5 time 20.1
>> 42 TS dt 0.5 time 20.6
>> 43 TS dt 0.5 time 21.1
>> 44 TS dt 0.5 time 21.6
>> 45 TS dt 0.5 time 22.1
>> 46 TS dt 0.5 time 22.6
>> 47 TS dt 0.5 time 23.1
>> 48 TS dt 0.5 time 23.6
>> 49 TS dt 0.5 time 24.1
>> 50 TS dt 0.5 time 24.6
>> 51 TS dt 0.5 time 25.1
>> TSEvent: Event 0 zero crossing at time 25.6 located in 0 iterations
>> Ball hit the ground at t = 25.60 seconds
>> 52 TS dt 0.5 time 25.6
>> 53 TS dt 0.5 time 26.1
>> 54 TS dt 0.5 time 26.6
>> 55 TS dt 0.5 time 27.1
>> 56 TS dt 0.5 time 27.6
>> 57 TS dt 0.5 time 28.1
>> 58 TS dt 0.5 time 28.6
>> 59 TS dt 0.5 time 29.1
>> 60 TS dt 0.5 time 29.6
>> 61 TS dt 0.5 time 30.1
>> 0 TS dt 0.1 time 0.
>> 1 TS dt 0.5 time 0.1
>> 2 TS dt 0.5 time 0.6
>> 3 TS dt 0.5 time 1.1
>> 4 TS dt 0.5 time 1.6
>> 5 TS dt 0.5 time 2.1
>> 6 TS dt 0.5 time 2.6
>> 7 TS dt 0.5 time 3.1
>> 8 TS dt 0.5 time 3.6
>> 9 TS dt 0.5 time 4.1
>> 10 TS dt 0.5 time 4.6
>> 11 TS dt 0.5 time 5.1
>> 12 TS dt 0.5 time 5.6
>> 13 TS dt 0.5 time 6.1
>> 14 TS dt 0.5 time 6.6
>> 15 TS dt 0.5 time 7.1
>> 16 TS dt 0.5 time 7.6
>> 17 TS dt 0.5 time 8.1
>> 18 TS dt 0.5 time 8.6
>> 19 TS dt 0.5 time 9.1
>> 20 TS dt 0.5 time 9.6
>> 21 TS dt 0.5 time 10.1
>> 22 TS dt 0.5 time 10.6
>> 23 TS dt 0.5 time 11.1
>> 24 TS dt 0.5 time 11.6
>> 25 TS dt 0.5 time 12.1
>> 26 TS dt 0.5 time 12.6
>> TSEvent: Event 0 zero crossing at time 13.1 located in 0 iterations
>> Ball hit the ground at t = 13.10 seconds
>> 27 TS dt 0.5 time 13.1
>> 28 TS dt 0.5 time 13.6
>> 29 TS dt 0.5 time 14.1
>> 30 TS dt 0.5 time 14.6
>> 31 TS dt 0.5 time 15.1
>> 32 TS dt 0.5 time 15.6
>> 33 TS dt 0.5 time 16.1
>> 34 TS dt 0.5 time 16.6
>> 35 TS dt 0.5 time 17.1
>> 36 TS dt 0.5 time 17.6
>> 37 TS dt 0.5 time 18.1
>> 38 TS dt 0.5 time 18.6
>> 39 TS dt 0.5 time 19.1
>> 40 TS dt 0.5 time 19.6
>> 41 TS dt 0.5 time 20.1
>> 42 TS dt 0.5 time 20.6
>> 43 TS dt 0.5 time 21.1
>> 44 TS dt 0.5 time 21.6
>> 45 TS dt 0.5 time 22.1
>> 46 TS dt 0.5 time 22.6
>> 47 TS dt 0.5 time 23.1
>> TSEvent: Event 0 zero crossing at time 23.6 located in 0 iterations
>> Ball hit the ground at t = 23.60 seconds
>> 48 TS dt 0.5 time 23.6
>> 49 TS dt 0.5 time 24.1
>> 50 TS dt 0.5 time 24.6
>> 51 TS dt 0.5 time 25.1
>> 52 TS dt 0.5 time 25.6
>> 53 TS dt 0.5 time 26.1
>> TSEvent: Event 0 zero crossing at time 26.6 located in 0 iterations
>> Ball hit the ground at t = 26.60 seconds
>> 54 TS dt 0.5 time 26.6
>> 55 TS dt 0.5 time 27.1
>> 56 TS dt 0.5 time 27.6
>> 57 TS dt 0.5 time 28.1
>> 58 TS dt 0.5 time 28.6
>> 59 TS dt 0.5 time 29.1
>> 60 TS dt 0.5 time 29.6
>> 61 TS dt 0. time 30.1
>> 
>> I don't see the 0.001 timestep here, do you get a different behavior?
>> 
>> Thank you,
>> 
>> Sophie
>> From: Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
>> Sent: Tuesday, October 27, 2020 15:34
>> To: Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>> Subject: Re: [petsc-users] TSSetEventHandler and TSSetPostEventIntervalStep
>>  
>> [External Email]
>> 
>> On Tue, Oct 27, 2020 at 3:09 PM Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>> Hi,
>> 
>> I am currently using TSSetEventHandler in my code to detect a random event where the solution vector gets modified during the event. Ideally, after the event happens I want the solver to use a much smaller timestep using TSSetPostEventIntervalStep. However, when I use TSSetPostEventIntervalStep the solver doesn't use the set value. I managed to reproduce the behavior by modifying ex40.c as attached.
>> 
>>  I stepped through ex40, and it does indeed change the timestep to 0.001. Can you be more specific, perhaps with monitors, about what you think is wrong?
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>> I think the issue is related to the fact that the fvalue is not technically "approaching" 0 with a random event, it is more of a step function instead. Do you have any recommendation on how to implement the behavior I'm looking for? Let me know if I can provide additional information.
>> 
>> Best,
>> 
>> Sophie
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201027/de364263/attachment.html>

From mbuerkle at web.de  Tue Oct 27 22:01:40 2020
From: mbuerkle at web.de (Marius Buerkle)
Date: Wed, 28 Oct 2020 04:01:40 +0100
Subject: [petsc-users] superlu_dist segfault
In-Reply-To: <DM6PR09MB56054FE068A8EC635C653D2F88160@DM6PR09MB5605.namprd09.prod.outlook.com>
References: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>
	<DM6PR09MB56054FE068A8EC635C653D2F88160@DM6PR09MB5605.namprd09.prod.outlook.com>
Message-ID: <trinity-0e12454f-c546-420e-8afb-18560fa2dbbf-1603854100759@3c-app-webde-bap43>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/32ff88ff/attachment-0001.html>

From jonathan.guyer at nist.gov  Wed Oct 28 11:35:19 2020
From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed))
Date: Wed, 28 Oct 2020 16:35:19 +0000
Subject: [petsc-users] Vexing deadlock situation with petsc4py
Message-ID: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov>

We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized.

I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761

I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions:

On processor 0 in

  matrix.setUp()

specifically in

  [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c

and on other processors a few lines earlier in

  matrix.create(comm)

specifically in

  [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c


The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like

    >>> from scipy.stats.mstats import argstoarray

Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger.

Running with `-malloc_debug true` doesn?t illuminate anything.

I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output.


I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/6c65b2f9/attachment.html>

From jonathan.guyer at nist.gov  Wed Oct 28 12:08:21 2020
From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed))
Date: Wed, 28 Oct 2020 17:08:21 +0000
Subject: [petsc-users] Vexing deadlock situation with petsc4py
In-Reply-To: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov>
References: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov>
Message-ID: <3E1C3DF7-1184-4E33-8701-941963737B05@nist.gov>

I should note that I?m running with a --with-debugging build that I?ve [forked from conda-forge/petsc-feedstock](https://github.com/guyer/petsc-feedstock/), but it doesn?t highlight any problems.
When I -start_in_debugger, I drop into lldb[*], but there are no symbols. The last assembler I knew was for the 6502 and I haven?t known that for a looooong time.

How can I get symbols included in my build?

If I drop into the [i]pdb Python debugger, the problem goes away.

[*] I?m running on a Mac, but the same deadlock happens on our linux builds

On Oct 28, 2020, at 12:35 PM, Guyer, Jonathan E. Dr. (Fed) via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized.

I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761

I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions:

On processor 0 in

  matrix.setUp()

specifically in

  [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c

and on other processors a few lines earlier in

  matrix.create(comm)

specifically in

  [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c


The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like

    >>> from scipy.stats.mstats import argstoarray

Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger.

Running with `-malloc_debug true` doesn?t illuminate anything.

I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output.


I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/eac724c5/attachment.html>

From wence at gmx.li  Wed Oct 28 12:21:02 2020
From: wence at gmx.li (Lawrence Mitchell)
Date: Wed, 28 Oct 2020 17:21:02 +0000
Subject: [petsc-users] Vexing deadlock situation with petsc4py
In-Reply-To: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov>
References: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov>
Message-ID: <6D258122-9563-4EAD-9ED4-6EA3A3177481@gmx.li>


> On 28 Oct 2020, at 16:35, Guyer, Jonathan E. Dr. (Fed) via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized.
> 
> I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761
> 
> I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions:
> 
> On processor 0 in 
> 
>   matrix.setUp() 
> 
> specifically in
> 
>   [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c
> 
> and on other processors a few lines earlier in
> 
>   matrix.create(comm)
> 
> specifically in 
> 
>   [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c
> 
> 
> The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like
> 
>     >>> from scipy.stats.mstats import argstoarray
> 
> Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger.
> 
> Running with `-malloc_debug true` doesn?t illuminate anything.
> 
> I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output.
> 
> 
> 
> I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering.

My crystal ball guess is the following:

PETSc objects have collective destroy semantics.

When using petsc4py, XXX.destroy() is called on an object when its Python refcount drops to zero, or when it is collected by the generational garbage collector.

In the absence of reference-cycles, all allocated objects will be collected by the refcounting part of the collector. This is (unless you do something funky like hold more references on one process than another) deterministic, and if you do normal SPMD programming, you'll call XXX.destroy() in the same order on the same objects on all processes.

If you have reference cycles, then the refcounting part of the collector will not collect these objects. Now you are at the mercy of the generational collector. This is definitely not deterministic. If different Python processes do different things (for example, rank 0 might open files) then when the generational collector runs is no longer in sync across processes.

A consequence is that you now might have rank 0 collect XXX then YYY, whereas rank 1 might collect YYY then XXX => deadlock.

You can test this hypothesis by turning off the garbage collector in your test that provokes the failure:

import gc
gc.disable()
...

If this turns out to be the case, I don't think there's a good solution here. You can audit your code base and ensure that objects that hold PETSc objects never participate in reference cycles. This is fragile.

Another option, is to explicitly require that the user of the API call XXX.destroy() on all your objects (and then PETSc objects). This is the decision taken for mpi4py: you are responsible for freeing any objects that you create.

That is, your API becomes more like the C API with

x = Foo(...) # holds some petsc object XX
... # use x
x.destroy() # calls XX.destroy()

you could make this more pythonic by wrapping this pattern in contextmanagers:

with Foo(...) as x:
   ...


Thanks,

Lawrence

From jonathan.guyer at nist.gov  Wed Oct 28 12:32:33 2020
From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed))
Date: Wed, 28 Oct 2020 17:32:33 +0000
Subject: [petsc-users] Vexing deadlock situation with petsc4py
In-Reply-To: <6D258122-9563-4EAD-9ED4-6EA3A3177481@gmx.li>
References: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov>
	<6D258122-9563-4EAD-9ED4-6EA3A3177481@gmx.li>
Message-ID: <6DE14276-6194-456A-B558-496DE4C67E27@nist.gov>

That?s very helpful, thanks!

Adding `gc.collect()` to the beginning of the offending test does indeed resolve that particular problem.

I?ve not been systematic about calling XXX.destroy(), thinking garbage collection was sufficient, so I need to get to work on that.

> On Oct 28, 2020, at 1:21 PM, Lawrence Mitchell <wence at gmx.li> wrote:
> 
> 
>> On 28 Oct 2020, at 16:35, Guyer, Jonathan E. Dr. (Fed) via petsc-users <petsc-users at mcs.anl.gov> wrote:
>> 
>> We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized.
>> 
>> I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761
>> 
>> I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions:
>> 
>> On processor 0 in 
>> 
>>  matrix.setUp() 
>> 
>> specifically in
>> 
>>  [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c
>> 
>> and on other processors a few lines earlier in
>> 
>>  matrix.create(comm)
>> 
>> specifically in 
>> 
>>  [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c
>> 
>> 
>> The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like
>> 
>>>>> from scipy.stats.mstats import argstoarray
>> 
>> Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger.
>> 
>> Running with `-malloc_debug true` doesn?t illuminate anything.
>> 
>> I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output.
>> 
>> 
>> 
>> I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering.
> 
> My crystal ball guess is the following:
> 
> PETSc objects have collective destroy semantics.
> 
> When using petsc4py, XXX.destroy() is called on an object when its Python refcount drops to zero, or when it is collected by the generational garbage collector.
> 
> In the absence of reference-cycles, all allocated objects will be collected by the refcounting part of the collector. This is (unless you do something funky like hold more references on one process than another) deterministic, and if you do normal SPMD programming, you'll call XXX.destroy() in the same order on the same objects on all processes.
> 
> If you have reference cycles, then the refcounting part of the collector will not collect these objects. Now you are at the mercy of the generational collector. This is definitely not deterministic. If different Python processes do different things (for example, rank 0 might open files) then when the generational collector runs is no longer in sync across processes.
> 
> A consequence is that you now might have rank 0 collect XXX then YYY, whereas rank 1 might collect YYY then XXX => deadlock.
> 
> You can test this hypothesis by turning off the garbage collector in your test that provokes the failure:
> 
> import gc
> gc.disable()
> ...
> 
> If this turns out to be the case, I don't think there's a good solution here. You can audit your code base and ensure that objects that hold PETSc objects never participate in reference cycles. This is fragile.
> 
> Another option, is to explicitly require that the user of the API call XXX.destroy() on all your objects (and then PETSc objects). This is the decision taken for mpi4py: you are responsible for freeing any objects that you create.
> 
> That is, your API becomes more like the C API with
> 
> x = Foo(...) # holds some petsc object XX
> ... # use x
> x.destroy() # calls XX.destroy()
> 
> you could make this more pythonic by wrapping this pattern in contextmanagers:
> 
> with Foo(...) as x:
>   ...
> 
> 
> Thanks,
> 
> Lawrence


From jonathan.guyer at nist.gov  Wed Oct 28 12:33:13 2020
From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed))
Date: Wed, 28 Oct 2020 17:33:13 +0000
Subject: [petsc-users] Vexing deadlock situation with petsc4py
In-Reply-To: <6DE14276-6194-456A-B558-496DE4C67E27@nist.gov>
References: <1A10892F-64FF-47AC-B733-9040F75AD09A@nist.gov>
	<6D258122-9563-4EAD-9ED4-6EA3A3177481@gmx.li>
	<6DE14276-6194-456A-B558-496DE4C67E27@nist.gov>
Message-ID: <144D75DD-1BD8-4C97-8399-C528CD6FE400@nist.gov>

*gc.disable()

> On Oct 28, 2020, at 1:32 PM, Jonathan Guyer <jonathan.guyer at nist.gov> wrote:
> 
> That?s very helpful, thanks!
> 
> Adding `gc.collect()` to the beginning of the offending test does indeed resolve that particular problem.
> 
> I?ve not been systematic about calling XXX.destroy(), thinking garbage collection was sufficient, so I need to get to work on that.
> 
>> On Oct 28, 2020, at 1:21 PM, Lawrence Mitchell <wence at gmx.li> wrote:
>> 
>> 
>>> On 28 Oct 2020, at 16:35, Guyer, Jonathan E. Dr. (Fed) via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>> 
>>> We use petsc4py as a solver suite in our [FiPy](https://www.ctcms.nist.gov/fipy) Python-based PDE solver package. Some time back, I refactored some of the code and provoked a deadlock situation in our test suite. I have been tearing what remains of my hair out trying to isolate things and am at a loss. I?ve gone through the refactoring line-by-line and I just don?t think I?ve changed anything substantive, just how the code is organized.
>>> 
>>> I have posted a branch that exhibits the issue at https://github.com/usnistgov/fipy/pull/761
>>> 
>>> I explain in greater detail in that ?pull request? how to reproduce, but in short, after a substantial number of our tests run, the code either deadlocks or raises exceptions:
>>> 
>>> On processor 0 in 
>>> 
>>> matrix.setUp() 
>>> 
>>> specifically in
>>> 
>>> [0] PetscSplitOwnership() line 93 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/utils/psplit.c
>>> 
>>> and on other processors a few lines earlier in
>>> 
>>> matrix.create(comm)
>>> 
>>> specifically in 
>>> 
>>> [1] PetscCommDuplicate() line 126 in /Users/runner/miniforge3/conda-bld/petsc_1601473259434/work/src/sys/objects/tagm.c
>>> 
>>> 
>>> The circumstances that lead to this failure are really fragile and it seems likely due to some memory corruption. Particularly likely given that I can make the failure go away by removing seemingly irrelevant things like
>>> 
>>>>>> from scipy.stats.mstats import argstoarray
>>> 
>>> Note that when I run the full test suite after taking out this scipy import, the same problem just arises elsewhere without any obvious similar import trigger.
>>> 
>>> Running with `-malloc_debug true` doesn?t illuminate anything.
>>> 
>>> I?ve run with `-info` and `-log_trace` and don?t see any obvious issues, but there?s a ton of output.
>>> 
>>> 
>>> 
>>> I have tried reducing things to a minimal reproducible example, but unfortunately things remain way too complicated and idiosyncratic to FiPy. I?m grateful for any help anybody can offer despite the mess that I?m offering.
>> 
>> My crystal ball guess is the following:
>> 
>> PETSc objects have collective destroy semantics.
>> 
>> When using petsc4py, XXX.destroy() is called on an object when its Python refcount drops to zero, or when it is collected by the generational garbage collector.
>> 
>> In the absence of reference-cycles, all allocated objects will be collected by the refcounting part of the collector. This is (unless you do something funky like hold more references on one process than another) deterministic, and if you do normal SPMD programming, you'll call XXX.destroy() in the same order on the same objects on all processes.
>> 
>> If you have reference cycles, then the refcounting part of the collector will not collect these objects. Now you are at the mercy of the generational collector. This is definitely not deterministic. If different Python processes do different things (for example, rank 0 might open files) then when the generational collector runs is no longer in sync across processes.
>> 
>> A consequence is that you now might have rank 0 collect XXX then YYY, whereas rank 1 might collect YYY then XXX => deadlock.
>> 
>> You can test this hypothesis by turning off the garbage collector in your test that provokes the failure:
>> 
>> import gc
>> gc.disable()
>> ...
>> 
>> If this turns out to be the case, I don't think there's a good solution here. You can audit your code base and ensure that objects that hold PETSc objects never participate in reference cycles. This is fragile.
>> 
>> Another option, is to explicitly require that the user of the API call XXX.destroy() on all your objects (and then PETSc objects). This is the decision taken for mpi4py: you are responsible for freeing any objects that you create.
>> 
>> That is, your API becomes more like the C API with
>> 
>> x = Foo(...) # holds some petsc object XX
>> ... # use x
>> x.destroy() # calls XX.destroy()
>> 
>> you could make this more pythonic by wrapping this pattern in contextmanagers:
>> 
>> with Foo(...) as x:
>>  ...
>> 
>> 
>> Thanks,
>> 
>> Lawrence
> 


From sajidsyed2021 at u.northwestern.edu  Wed Oct 28 14:12:47 2020
From: sajidsyed2021 at u.northwestern.edu (Sajid Ali)
Date: Wed, 28 Oct 2020 14:12:47 -0500
Subject: [petsc-users] Regarding changes in the 3.14 release
Message-ID: <CAOGsD9ikh_i8n+YQAAESLUDjas7UwLM6PtgNUaq7D_rEvJG+Ug@mail.gmail.com>

Hi PETSc-developers,

I have a few questions regarding changes to PETSc between version 3.13.5
and current master. I?m trying to run an application that worked with no
issues with version 3.13.5 but isn?t working with the current master.

[1] To assemble a matrix in this application I loop over all rows and have
multiple calls to MatSetValuesStencil with INSERT_VALUES as the addv
argument for all except one call which has ADD_VALUES. Final assembly is
called after this loop. With PETSc-3.13.5 this ran with no errors but with
PETSc-master I get :

Object is in wrong state
[0]PETSC ERROR: Cannot mix add values and insert values

This is fixed by having a flush assembly in between two stages where the
first stage has two loops with INSERT_VALUES and the second stage has a
loop with ADD_VALUES.

Did this change result from a bugfix or are users now expected to no longer
mix add and insert values within the same loop ?

[2] To prevent re-building the preconditioner at all TSSteps, I had the
command line argument -snes_lag_preconditioner -1. This did the job in
3.13.5 but with the current master I get the following error :

Cannot set the lag to -1 from the command line since the
preconditioner must be built as least once, perhaps you mean -2

I can however run the application without this option. If this is a
breaking change, what is the new option to prevent re-building the
preconditioner ?

[3] Finally, I?m used the latest development version of MPICH for building
both 3.13.5 and petsc-master and I get these warnings at exit :

[WARNING] yaksa: 2 leaked handles
.... (repeated N number of times where N is number of mpi ranks)

Can this be safely neglected ?

Let me know if sharing either the application code and/or logs would be
helpful and I can share either.

Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/66b3b86b/attachment-0001.html>

From knepley at gmail.com  Wed Oct 28 14:31:08 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 28 Oct 2020 15:31:08 -0400
Subject: [petsc-users] Regarding changes in the 3.14 release
In-Reply-To: <CAOGsD9ikh_i8n+YQAAESLUDjas7UwLM6PtgNUaq7D_rEvJG+Ug@mail.gmail.com>
References: <CAOGsD9ikh_i8n+YQAAESLUDjas7UwLM6PtgNUaq7D_rEvJG+Ug@mail.gmail.com>
Message-ID: <CAMYG4G=b5tmrAQNv+9rZvEU6iaDj1z4hEn5pt6+DDxJy0-eXRg@mail.gmail.com>

On Wed, Oct 28, 2020 at 3:13 PM Sajid Ali <sajidsyed2021 at u.northwestern.edu>
wrote:

> Hi PETSc-developers,
>
> I have a few questions regarding changes to PETSc between version 3.13.5
> and current master. I?m trying to run an application that worked with no
> issues with version 3.13.5 but isn?t working with the current master.
>
> [1] To assemble a matrix in this application I loop over all rows and have
> multiple calls to MatSetValuesStencil with INSERT_VALUES as the addv
> argument for all except one call which has ADD_VALUES. Final assembly is
> called after this loop. With PETSc-3.13.5 this ran with no errors but with
> PETSc-master I get :
>
> Object is in wrong state
> [0]PETSC ERROR: Cannot mix add values and insert values
>
> This is fixed by having a flush assembly in between two stages where the
> first stage has two loops with INSERT_VALUES and the second stage has a
> loop with ADD_VALUES.
>
> Did this change result from a bugfix or are users now expected to no
> longer mix add and insert values within the same loop ?
>
We never checked before. You were never supposed to do that. It can break.

> [2] To prevent re-building the preconditioner at all TSSteps, I had the
> command line argument -snes_lag_preconditioner -1. This did the job in
> 3.13.5 but with the current master I get the following error :
>
> Cannot set the lag to -1 from the command line since the preconditioner must be built as least once, perhaps you mean -2
>
> I can however run the application without this option. If this is a
> breaking change, what is the new option to prevent re-building the
> preconditioner ?
>
-1 means never build, but you have not built the preconditioner. Thus you
probably want -2 which means build once and then not again.


> [3] Finally, I?m used the latest development version of MPICH for building
> both 3.13.5 and petsc-master and I get these warnings at exit :
>
> [WARNING] yaksa: 2 leaked handles
> .... (repeated N number of times where N is number of mpi ranks)
>
> Can this be safely neglected ?
>
I don't know.

  Thanks,

     Matt

> Let me know if sharing either the application code and/or logs would be
> helpful and I can share either.
>
> Thank You,
> Sajid Ali | PhD Candidate
> Applied Physics
> Northwestern University
> s-sajid-ali.github.io
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/81189e13/attachment.html>

From salazardetro1 at llnl.gov  Wed Oct 28 16:54:04 2020
From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel)
Date: Wed, 28 Oct 2020 21:54:04 +0000
Subject: [petsc-users] TSAdjoint and adaptive time stepping
Message-ID: <DF2CD5E1-8B35-4323-8FCD-CDB5F1A054EC@llnl.gov>

Hello,

I saw in the TSAdjoint paper that adjoints for adaptive time stepping schemes are supported. Given that these schemes usually involve nondifferentiable functions to pick the time step, are the sensitivities also nondifferentiable at certain points? Does one need to be careful when using adjoints with adaptive time steps?

Thanks
Miguel


Miguel A. Salazar de Troya
Postdoctoral Researcher, Lawrence Livermore National Laboratory
B141
Rm: 1085-5
Ph: 1(925) 422-6411
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/ddb86725/attachment.html>

From sajidsyed2021 at u.northwestern.edu  Wed Oct 28 19:56:33 2020
From: sajidsyed2021 at u.northwestern.edu (Sajid Ali)
Date: Wed, 28 Oct 2020 19:56:33 -0500
Subject: [petsc-users] Regarding changes in the 3.14 release
In-Reply-To: <CAMYG4G=b5tmrAQNv+9rZvEU6iaDj1z4hEn5pt6+DDxJy0-eXRg@mail.gmail.com>
References: <CAOGsD9ikh_i8n+YQAAESLUDjas7UwLM6PtgNUaq7D_rEvJG+Ug@mail.gmail.com>
	<CAMYG4G=b5tmrAQNv+9rZvEU6iaDj1z4hEn5pt6+DDxJy0-eXRg@mail.gmail.com>
Message-ID: <CAOGsD9j9cUfsfF7qcHvUw_1_3=nKy2jpFnxu2VYD-9QZzjr=5w@mail.gmail.com>

Hi Matt,

Thanks for the clarification. The documentation
<https://gitlab.com/petsc/petsc/-/blob/master/src/snes/interface/snes.c#L3304>
for SNESSetLagPreconditioner states "If  -1 is used before the very first
nonlinear solve the preconditioner is still built because there is no
previous preconditioner to use" which was true prior to 3.14, is this
statement no longer valid ?

What is the difference between having -snes_lag_preconditioner -2 and
having -snes_lag_preconditioner_persists true ?

PS :  The man pages for SNESSetLagJacobianPersists should perhaps not state
the lag preconditioner options database keys and vice versa for clarity.

Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/63ce5ce7/attachment-0001.html>

From knepley at gmail.com  Wed Oct 28 20:08:39 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 28 Oct 2020 21:08:39 -0400
Subject: [petsc-users] Regarding changes in the 3.14 release
In-Reply-To: <CAOGsD9j9cUfsfF7qcHvUw_1_3=nKy2jpFnxu2VYD-9QZzjr=5w@mail.gmail.com>
References: <CAOGsD9ikh_i8n+YQAAESLUDjas7UwLM6PtgNUaq7D_rEvJG+Ug@mail.gmail.com>
	<CAMYG4G=b5tmrAQNv+9rZvEU6iaDj1z4hEn5pt6+DDxJy0-eXRg@mail.gmail.com>
	<CAOGsD9j9cUfsfF7qcHvUw_1_3=nKy2jpFnxu2VYD-9QZzjr=5w@mail.gmail.com>
Message-ID: <CAMYG4GkgBsiPHnXVLfcJrQ_mLzhA4veAZuTy6-5d_3-_vhJasQ@mail.gmail.com>

On Wed, Oct 28, 2020 at 8:57 PM Sajid Ali <sajidsyed2021 at u.northwestern.edu>
wrote:

> Hi Matt,
>
> Thanks for the clarification. The documentation
> <https://gitlab.com/petsc/petsc/-/blob/master/src/snes/interface/snes.c#L3304>
> for SNESSetLagPreconditioner states "If  -1 is used before the very first
> nonlinear solve the preconditioner is still built because there is no
> previous preconditioner to use" which was true prior to 3.14, is this
> statement no longer valid ?
>

Sounds like it is not. Barry?


> What is the difference between having -snes_lag_preconditioner -2 and
> having -snes_lag_preconditioner_persists true ?
>

Persists applies to multiple solves, whereas -2 only applies to the current
one.

  Thanks,

     Matt


> PS :  The man pages for SNESSetLagJacobianPersists should perhaps not
> state the lag preconditioner options database keys and vice versa for
> clarity.
>
> Thank You,
> Sajid Ali | PhD Candidate
> Applied Physics
> Northwestern University
> s-sajid-ali.github.io
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/b74a6489/attachment.html>

From hzhang at mcs.anl.gov  Wed Oct 28 20:10:38 2020
From: hzhang at mcs.anl.gov (Zhang, Hong)
Date: Thu, 29 Oct 2020 01:10:38 +0000
Subject: [petsc-users] superlu_dist segfault
In-Reply-To: <trinity-0e12454f-c546-420e-8afb-18560fa2dbbf-1603854100759@3c-app-webde-bap43>
References: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>
	<DM6PR09MB56054FE068A8EC635C653D2F88160@DM6PR09MB5605.namprd09.prod.outlook.com>,
	<trinity-0e12454f-c546-420e-8afb-18560fa2dbbf-1603854100759@3c-app-webde-bap43>
Message-ID: <DM6PR09MB5605FC3FA443091C973BB24888170@DM6PR09MB5605.namprd09.prod.outlook.com>

Marius,
I tested your code with petsc-release on my mac laptop using np=2 cores. I first tested a small matrix data file successfully. Then I switch to your data file and run out of memory, likely due to the dense matrices B and X. I got an error "Your system has run out of application memory" from my laptop.

The sparse matrix A has size 42549 by 42549. Your code creates dense matrices B and X with the same size -- a huge memory requirement!
By replacing B and X with size 42549 by nrhs (nrhs =< 4000), I had the code run well with np=2. Note the error message you got
[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range

The modified code I used is attached.
Hong
________________________________
From: Marius Buerkle <mbuerkle at web.de>
Sent: Tuesday, October 27, 2020 10:01 PM
To: Zhang, Hong <hzhang at mcs.anl.gov>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Sherry Li <xiaoye at nersc.gov>
Subject: Aw: Re: [petsc-users] superlu_dist segfault

Hi,

I recompiled PETSC with debug option, now I get a seg fault at a different position

[23]PETSC ERROR: ------------------------------------------------------------------------
[23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[23]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[23]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[23]PETSC ERROR: likely location of problem given in stack below
[23]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[23]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[23]PETSC ERROR:       INSTEAD the line number of the start of the function
[23]PETSC ERROR:       is given.
[23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
[23]PETSC ERROR: [23] MatMatSolve line 3466 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c
[23]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[23]PETSC ERROR: Signal received

I  made a small reproducer. The matrix is a bit too big so I cannot attach it directly to the email, but I put it in the cloud
https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw

Best,
Marius


Gesendet: Dienstag, 27. Oktober 2020 um 23:11 Uhr
Von: "Zhang, Hong" <hzhang at mcs.anl.gov>
An: "Marius Buerkle" <mbuerkle at web.de>, "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>, "Sherry Li" <xiaoye at nersc.gov>
Betreff: Re: [petsc-users] superlu_dist segfault
Marius,
It fails at the line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c
    if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex))))     ABORT("Malloc fails for lsum[].");

We do not know what it means. You may use a debugger to check the values of the variables involved.
I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone short code that reproduce the error. We can help on its investigation.
Hong


________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Marius Buerkle <mbuerkle at web.de>
Sent: Tuesday, October 27, 2020 8:46 AM
To: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: [petsc-users] superlu_dist segfault

Hi,

When using MatMatSolve with superlu_dist I get a segmentation fault:

Malloc fails for lsum[]. at line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c

The matrix size is not particular big and I am using the petsc release branch and superlu_dist is v6.3.0 I think.

Best,
Marius
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/1ec62bfd/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: superlu_test.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/1ec62bfd/attachment.c>

From hongzhang at anl.gov  Wed Oct 28 22:36:15 2020
From: hongzhang at anl.gov (Zhang, Hong)
Date: Thu, 29 Oct 2020 03:36:15 +0000
Subject: [petsc-users] TSAdjoint and adaptive time stepping
In-Reply-To: <DF2CD5E1-8B35-4323-8FCD-CDB5F1A054EC@llnl.gov>
References: <DF2CD5E1-8B35-4323-8FCD-CDB5F1A054EC@llnl.gov>
Message-ID: <4CC86760-34FD-4AE5-B936-05E51B71820A@anl.gov>

I think it depends on the functional for which the sensitivities are calculated. For most cases, the objective functional should not be sensitive to the step sizes when a converged solution is achieved. What the adapter does is just to choose a step size so that the solution is accurate within certain tolerances. Of course, if the adapter is not doing a good job (e.g. choosing a step size that leads to instability), not only the sensitivities are influenced but also the solution is inaccurate.

Hong (Mr.)


On Oct 28, 2020, at 4:54 PM, Salazar De Troya, Miguel via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hello,

I saw in the TSAdjoint paper that adjoints for adaptive time stepping schemes are supported. Given that these schemes usually involve nondifferentiable functions to pick the time step, are the sensitivities also nondifferentiable at certain points? Does one need to be careful when using adjoints with adaptive time steps?

Thanks
Miguel


Miguel A. Salazar de Troya
Postdoctoral Researcher, Lawrence Livermore National Laboratory
B141
Rm: 1085-5
Ph: 1(925) 422-6411

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/fd72233e/attachment-0001.html>

From mbuerkle at web.de  Wed Oct 28 23:43:41 2020
From: mbuerkle at web.de (Marius Buerkle)
Date: Thu, 29 Oct 2020 05:43:41 +0100
Subject: [petsc-users] superlu_dist segfault
In-Reply-To: <DM6PR09MB5605FC3FA443091C973BB24888170@DM6PR09MB5605.namprd09.prod.outlook.com>
References: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>
	<DM6PR09MB56054FE068A8EC635C653D2F88160@DM6PR09MB5605.namprd09.prod.outlook.com>
	<trinity-0e12454f-c546-420e-8afb-18560fa2dbbf-1603854100759@3c-app-webde-bap43>
	<DM6PR09MB5605FC3FA443091C973BB24888170@DM6PR09MB5605.namprd09.prod.outlook.com>
Message-ID: <trinity-358b7d1b-390c-4869-a4ad-3a15737e0a52-1603946621348@3c-app-webde-bs45>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/747a9510/attachment.html>

From xsli at lbl.gov  Thu Oct 29 00:14:24 2020
From: xsli at lbl.gov (Xiaoye S. Li)
Date: Wed, 28 Oct 2020 22:14:24 -0700
Subject: [petsc-users] superlu_dist segfault
In-Reply-To: <trinity-358b7d1b-390c-4869-a4ad-3a15737e0a52-1603946621348@3c-app-webde-bs45>
References: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>
	<DM6PR09MB56054FE068A8EC635C653D2F88160@DM6PR09MB5605.namprd09.prod.outlook.com>
	<trinity-0e12454f-c546-420e-8afb-18560fa2dbbf-1603854100759@3c-app-webde-bap43>
	<DM6PR09MB5605FC3FA443091C973BB24888170@DM6PR09MB5605.namprd09.prod.outlook.com>
	<trinity-358b7d1b-390c-4869-a4ad-3a15737e0a52-1603946621348@3c-app-webde-bs45>
Message-ID: <CAFvbobV=wx7qoamoNXO1FaTs5Wiq31KHZC2V+Z5+aXb2z_7sxQ@mail.gmail.com>

Hong: thanks for the diagnosis!

Marius: how many OpenMP threads are you using per MPI task?
In an earlier email, you mentioned the allocation failure at the following
line:
  if ( !(lsum = (doublecomplex*) SUPERLU_MALLOC(sizelsum*num_thread *
sizeof(doublecomplex))))     ABORT("Malloc fails for lsum[].");

this is in the solve phase. I think when we do some OpenMP optimization, we
allowed several data structures to grow with OpenMP threads.  You can try
to use 1 thread.

The RHS and X  memories are easy to compute. However, in order to gauge how
much memory is used in the factorization, can you print out the number of
nonzeros in the L and U factors?   What ordering option are you using?  The
sparse matrix A looks pretty small.

The code can also print out the working storage used during factorization.
I am not sure how this printing can be turned on through PETSc.

Sherry

On Wed, Oct 28, 2020 at 9:43 PM Marius Buerkle <mbuerkle at web.de> wrote:

> Thanks for the swift reply.
>
> I also realized if I reduce the number of RHS then it works. But I am
> running the code on a cluster with 256GB ram / node.  One dense matrix
> would be around ~30 Gb so 60 Gb, which is large but does exceed the
> memory of even one node and I also get the seg fault if I run it on several
> nodes. Moreover, it works well with MUMPS and MKL_CPARDISO solver. The
> maxium memory used when using MUMPS is around 150 Gb during the solver
> phase but for SuperLU_dist it crashed even before reaching the solver
> phase. Could there be such a large difference in memory usage between
> SuperLu_dist and MUMPS ?
>
>
>
> best,
>
> marius
>
> *Gesendet:* Donnerstag, 29. Oktober 2020 um 10:10 Uhr
> *Von:* "Zhang, Hong" <hzhang at mcs.anl.gov>
> *An:* "Marius Buerkle" <mbuerkle at web.de>
> *Cc:* "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>, "Sherry Li" <
> xiaoye at nersc.gov>
> *Betreff:* Re: Re: [petsc-users] superlu_dist segfault
> Marius,
> I tested your code with petsc-release on my mac laptop using np=2 cores. I
> first tested a small matrix data file successfully. Then I switch to your
> data file and run out of memory, likely due to the dense matrices B and X.
> I got an error "Your system has run out of application memory" from my
> laptop.
>
> The sparse matrix A has size 42549 by 42549. Your code creates dense
> matrices B and X with the same size -- a huge memory requirement!
> By replacing B and X with size 42549 by nrhs (nrhs =< 4000), I had the
> code run well with np=2. Note the error message you got
> [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
>
> The modified code I used is attached.
> Hong
>
> ------------------------------
> *From:* Marius Buerkle <mbuerkle at web.de>
> *Sent:* Tuesday, October 27, 2020 10:01 PM
> *To:* Zhang, Hong <hzhang at mcs.anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Sherry Li <
> xiaoye at nersc.gov>
> *Subject:* Aw: Re: [petsc-users] superlu_dist segfault
>
> Hi,
>
> I recompiled PETSC with debug option, now I get a seg fault at a different
> position
>
> [23]PETSC ERROR:
> ------------------------------------------------------------------------
> [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [23]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [23]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [23]PETSC ERROR: likely location of problem given in stack below
> [23]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [23]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [23]PETSC ERROR:       INSTEAD the line number of the start of the function
> [23]PETSC ERROR:       is given.
> [23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242
> /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211
> /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [23]PETSC ERROR: [23] MatMatSolve line 3466
> /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c
> [23]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [23]PETSC ERROR: Signal received
>
> I  made a small reproducer. The matrix is a bit too big so I cannot attach
> it directly to the email, but I put it in the cloud
> https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw
>
> Best,
> Marius
>
>
> *Gesendet:* Dienstag, 27. Oktober 2020 um 23:11 Uhr
> *Von:* "Zhang, Hong" <hzhang at mcs.anl.gov>
> *An:* "Marius Buerkle" <mbuerkle at web.de>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>, "Sherry Li" <xiaoye at nersc.gov>
> *Betreff:* Re: [petsc-users] superlu_dist segfault
> Marius,
> It fails at the line 1075 in file
> /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c
>     if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread *
> sizeof(doublecomplex))))     ABORT("Malloc fails for lsum[].");
>
> We do not know what it means. You may use a debugger to check the values
> of the variables involved.
> I'm cc'ing Sherry (superlu_dist developer), or you may send us a
> stand-alone short code that reproduce the error. We can help on its
> investigation.
> Hong
>
>
> ------------------------------
> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Marius
> Buerkle <mbuerkle at web.de>
> *Sent:* Tuesday, October 27, 2020 8:46 AM
> *To:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* [petsc-users] superlu_dist segfault
>
> Hi,
>
> When using MatMatSolve with superlu_dist I get a segmentation fault:
>
> Malloc fails for lsum[]. at line 1075 in file
> /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c
>
> The matrix size is not particular big and I am using the petsc release
> branch and superlu_dist is v6.3.0 I think.
>
> Best,
> Marius
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201028/24af4ab8/attachment-0001.html>

From mbuerkle at web.de  Thu Oct 29 01:04:47 2020
From: mbuerkle at web.de (Marius Buerkle)
Date: Thu, 29 Oct 2020 07:04:47 +0100
Subject: [petsc-users] superlu_dist segfault
In-Reply-To: <CAFvbobV=wx7qoamoNXO1FaTs5Wiq31KHZC2V+Z5+aXb2z_7sxQ@mail.gmail.com>
References: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>
	<DM6PR09MB56054FE068A8EC635C653D2F88160@DM6PR09MB5605.namprd09.prod.outlook.com>
	<trinity-0e12454f-c546-420e-8afb-18560fa2dbbf-1603854100759@3c-app-webde-bap43>
	<DM6PR09MB5605FC3FA443091C973BB24888170@DM6PR09MB5605.namprd09.prod.outlook.com>
	<trinity-358b7d1b-390c-4869-a4ad-3a15737e0a52-1603946621348@3c-app-webde-bs45>
	<CAFvbobV=wx7qoamoNXO1FaTs5Wiq31KHZC2V+Z5+aXb2z_7sxQ@mail.gmail.com>
Message-ID: <trinity-3aebe482-2947-49d8-ab68-295d22964cbb-1603951487623@3c-app-webde-bs45>

An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/f4b4e28d/attachment.html>

From salazardetro1 at llnl.gov  Thu Oct 29 11:10:56 2020
From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel)
Date: Thu, 29 Oct 2020 16:10:56 +0000
Subject: [petsc-users] TSAdjoint and adaptive time stepping
In-Reply-To: <4CC86760-34FD-4AE5-B936-05E51B71820A@anl.gov>
References: <DF2CD5E1-8B35-4323-8FCD-CDB5F1A054EC@llnl.gov>
	<4CC86760-34FD-4AE5-B936-05E51B71820A@anl.gov>
Message-ID: <0FF12D70-5AA1-4E8F-AE22-D238DFBF6048@llnl.gov>

Does this mean that the adjoint method doesn?t take into account the step adapter? Meaning that the adapter is not differentiated with respect to its dependencies (one of them being the solution at each time step). I can imagine that a discrete adjoint method with a step controller should be differentiating the step controller as well.

Thanks
Miguel

From: "Zhang, Hong" <hongzhang at anl.gov>
Date: Wednesday, October 28, 2020 at 8:36 PM
To: "Salazar De Troya, Miguel" <salazardetro1 at llnl.gov>
Cc: "Guyer, Jonathan E. Dr. (Fed) via petsc-users" <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] TSAdjoint and adaptive time stepping

I think it depends on the functional for which the sensitivities are calculated. For most cases, the objective functional should not be sensitive to the step sizes when a converged solution is achieved. What the adapter does is just to choose a step size so that the solution is accurate within certain tolerances. Of course, if the adapter is not doing a good job (e.g. choosing a step size that leads to instability), not only the sensitivities are influenced but also the solution is inaccurate.

Hong (Mr.)


On Oct 28, 2020, at 4:54 PM, Salazar De Troya, Miguel via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hello,

I saw in the TSAdjoint paper that adjoints for adaptive time stepping schemes are supported. Given that these schemes usually involve nondifferentiable functions to pick the time step, are the sensitivities also nondifferentiable at certain points? Does one need to be careful when using adjoints with adaptive time steps?

Thanks
Miguel


Miguel A. Salazar de Troya
Postdoctoral Researcher, Lawrence Livermore National Laboratory
B141
Rm: 1085-5
Ph: 1(925) 422-6411


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/5c26a846/attachment-0001.html>

From dsu at eoas.ubc.ca  Thu Oct 29 14:01:10 2020
From: dsu at eoas.ubc.ca (Su,D.S. Danyang)
Date: Thu, 29 Oct 2020 19:01:10 +0000
Subject: [petsc-users] Quite different behaviours of PETSc solver on
 different clusters
Message-ID: <99FC758F-929C-4388-B09E-DC11FCB0004A@eoas.ubc.ca>

Dear PETSc users,

This is a question bother me for some time. I have the same code running on different clusters and both clusters have good speedup. However, I noticed some thing quite strange. On one cluster, the solver is quite stable in computing time while on another cluster, the solver is unstable in computing time. As shown in the figure below, the local calculation almost has no communication and the computing time in this part is quite stable. However, PETSc solver on Cluster B jumps quite a lot and the performance is not as good as Cluster A, even though the local calculation is a little better on Cluster B. There are some difference on hardware and PETSc configuration and optimization. Cluster A uses OpenMPI + GCC compiler and Cluster B uses MPICH + GCC compiler. The number of processors used is 128 on Cluster A and 120 on Cluster B. I also tested different number of processors but the problem is the same. Does anyone have any idea which part might cause this problem?

[cid:image001.png at 01D6ADEB.30817A80]

[cid:image002.png at 01D6ADEB.30817A80]

Thanks and regards,

Danyang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/74246ddc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 103885 bytes
Desc: image001.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/74246ddc/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 107143 bytes
Desc: image002.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/74246ddc/attachment-0003.png>

From knepley at gmail.com  Thu Oct 29 20:05:53 2020
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 29 Oct 2020 21:05:53 -0400
Subject: [petsc-users] Quite different behaviours of PETSc solver on
 different clusters
In-Reply-To: <99FC758F-929C-4388-B09E-DC11FCB0004A@eoas.ubc.ca>
References: <99FC758F-929C-4388-B09E-DC11FCB0004A@eoas.ubc.ca>
Message-ID: <CAMYG4Gkxxz3=EP2Lhf5iFPt3byb5SfQGjntTtr=kHeRWi+Jzwg@mail.gmail.com>

On Thu, Oct 29, 2020 at 3:04 PM Su,D.S. Danyang <dsu at eoas.ubc.ca> wrote:

> Dear PETSc users,
>
>
>
> This is a question bother me for some time. I have the same code running
> on different clusters and both clusters have good speedup. However, I
> noticed some thing quite strange. On one cluster, the solver is quite
> stable in computing time while on another cluster, the solver is unstable
> in computing time. As shown in the figure below, the local calculation
> almost has no communication and the computing time in this part is quite
> stable. However, PETSc solver on Cluster B jumps quite a lot and the
> performance is not as good as Cluster A, even though the local calculation
> is a little better on Cluster B. There are some difference on hardware and
> PETSc configuration and optimization. Cluster A uses OpenMPI + GCC compiler
> and Cluster B uses MPICH + GCC compiler. The number of processors used is
> 128 on Cluster A and 120 on Cluster B. I also tested different number of
> processors but the problem is the same. Does anyone have any idea which
> part might cause this problem?
>

First question: Does the solver take more iterates when the time bumps up?

  Thanks,

    Matt


>
>
>
>
> Thanks and regards,
>
>
>
> Danyang
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/9c1cf548/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 103885 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/9c1cf548/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 107143 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/9c1cf548/attachment-0003.png>

From dsu at eoas.ubc.ca  Thu Oct 29 20:17:38 2020
From: dsu at eoas.ubc.ca (Danyang Su)
Date: Thu, 29 Oct 2020 18:17:38 -0700
Subject: [petsc-users] Quite different behaviours of PETSc solver on
 different clusters
In-Reply-To: <CAMYG4Gkxxz3=EP2Lhf5iFPt3byb5SfQGjntTtr=kHeRWi+Jzwg@mail.gmail.com>
References: <99FC758F-929C-4388-B09E-DC11FCB0004A@eoas.ubc.ca>
	<CAMYG4Gkxxz3=EP2Lhf5iFPt3byb5SfQGjntTtr=kHeRWi+Jzwg@mail.gmail.com>
Message-ID: <18AD9C93-7FFC-46DB-BB38-21E2644CA550@eoas.ubc.ca>

Hi Matt,

No, interations from both linear and nonlinear solvers are similar. The system administrator doubt that the latency in mpich makes the difference. We will test a petsc version with OpenMPI on that cluster to check if it makes difference.

Thanks, 

Danyang

On October 29, 2020 6:05:53 p.m. PDT, Matthew Knepley <knepley at gmail.com> wrote:
>On Thu, Oct 29, 2020 at 3:04 PM Su,D.S. Danyang <dsu at eoas.ubc.ca>
>wrote:
>
>> Dear PETSc users,
>>
>>
>>
>> This is a question bother me for some time. I have the same code
>running
>> on different clusters and both clusters have good speedup. However, I
>> noticed some thing quite strange. On one cluster, the solver is quite
>> stable in computing time while on another cluster, the solver is
>unstable
>> in computing time. As shown in the figure below, the local
>calculation
>> almost has no communication and the computing time in this part is
>quite
>> stable. However, PETSc solver on Cluster B jumps quite a lot and the
>> performance is not as good as Cluster A, even though the local
>calculation
>> is a little better on Cluster B. There are some difference on
>hardware and
>> PETSc configuration and optimization. Cluster A uses OpenMPI + GCC
>compiler
>> and Cluster B uses MPICH + GCC compiler. The number of processors
>used is
>> 128 on Cluster A and 120 on Cluster B. I also tested different number
>of
>> processors but the problem is the same. Does anyone have any idea
>which
>> part might cause this problem?
>>
>
>First question: Does the solver take more iterates when the time bumps
>up?
>
>  Thanks,
>
>    Matt
>
>
>>
>>
>>
>>
>> Thanks and regards,
>>
>>
>>
>> Danyang
>>
>>
>>
>
>
>-- 
>What most experimenters take for granted before they begin their
>experiments is infinitely more interesting than any results to which
>their
>experiments lead.
>-- Norbert Wiener
>
>https://www.cse.buffalo.edu/~knepley/
><http://www.cse.buffalo.edu/~knepley/>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/ae9fa635/attachment.html>

From elbueler at alaska.edu  Thu Oct 29 20:28:49 2020
From: elbueler at alaska.edu (Ed Bueler)
Date: Thu, 29 Oct 2020 17:28:49 -0800
Subject: [petsc-users] new book introducing PETSc for PDEs
Message-ID: <CAOHboJ8PV-+ffPqeTNn6EwoE9Qiz05y+mNKdHYTpJj2MgWJMbA@mail.gmail.com>

All --

SIAM Press just published my new book "PETSc for Partial Differential
Equations: Numerical Solutions in C and Python":

  https://my.siam.org/Store/Product/viewproduct/?ProductId=32850137

The book is available both as a paperback and an e-book with working
links.  A SIAM member discount is available, of course.

This book is a genuine introduction which does not assume you have used
PETSc before, and which should make sense even if your differential
equations knowledge is basic.  The prerequisites are a bit of programming
in C and a bit of numerical linear algebra, roughly like the main ideas of
Trefethen and Bau, but even that is reviewed and summarized.  I've made an
effort to introduce discretizations from the beginning, especially finite
differences and elements.

The book is based on a collection of example programs at
https://github.com/bueler/p4pdes.  Most of these codes call PETSc directly
through the C API, but the last two chapters have Python codes using UFL
and Firedrake.  Nonetheless the book contains ideas, mathematical and
computational; it complements, but does not replace, the PETSc User's
Manual and the tutorial examples in the PETSc source.  Concepts are
explained and illustrated, with sufficient context to facilitate further
development. Performance (optimality) and parallel scalability are the
primary goals, so preconditioners including multigrid are central threads,
and run-time solver options are explored in both the text and the exercises.

Here is the place to appreciate the usual PETSc suspects for their comments
on drafts, and help in writing this book: Barry, Jed, Matt, Dave, Rich,
Lois, Patrick, Mark, Satish, David K., and many others.  Also let me say
that SIAM Press has nothing but professionals who are nice to work with
too; send them your book idea!

Ed

-- 
Ed Bueler
Dept of Mathematics and Statistics
University of Alaska Fairbanks
Fairbanks, AK 99775-6660
306C Chapman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201029/7b8ccf45/attachment.html>

From bsmith at petsc.dev  Fri Oct 30 14:07:54 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 30 Oct 2020 14:07:54 -0500
Subject: [petsc-users] superlu_dist segfault
In-Reply-To: <trinity-3aebe482-2947-49d8-ab68-295d22964cbb-1603951487623@3c-app-webde-bs45>
References: <trinity-59aa1cfa-9517-4d41-8d97-afc0e900b0b2-1603806377091@3c-app-webde-bs60>
	<DM6PR09MB56054FE068A8EC635C653D2F88160@DM6PR09MB5605.namprd09.prod.outlook.com>
	<trinity-0e12454f-c546-420e-8afb-18560fa2dbbf-1603854100759@3c-app-webde-bap43>
	<DM6PR09MB5605FC3FA443091C973BB24888170@DM6PR09MB5605.namprd09.prod.outlook.com>
	<trinity-358b7d1b-390c-4869-a4ad-3a15737e0a52-1603946621348@3c-app-webde-bs45>
	<CAFvbobV=wx7qoamoNXO1FaTs5Wiq31KHZC2V+Z5+aXb2z_7sxQ@mail.gmail.com>
	<trinity-3aebe482-2947-49d8-ab68-295d22964cbb-1603951487623@3c-app-webde-bs45>
Message-ID: <998AF097-67A5-45DA-9E6F-4201A4BEDDFE@petsc.dev>


 Have you run it yet with valgrind, good be memory corruption earlier that causes a later crash, crashes that occur at different places for the same run are almost always due to memory corruption. 

  If valgrind is clean you can run with -on_error_attach_debugger and if the X forwarding is set up it will open a debugger on the crashing process and you can type bt to see exactly where it is crashing, at what line number and code line.

  Barry


> On Oct 29, 2020, at 1:04 AM, Marius Buerkle <mbuerkle at web.de> wrote:
> 
> Hi Sherry,
>  
> I used only 1 OpenMP thread and I also recompiled PETSC in debug mode with OpenMP turned off. But did not help. 
>  
> Here is the output I can get from SuperLu during the PETSC run
>         Nonzeros in L       29519630
>         Nonzeros in U       29519630
>         nonzeros in L+U     58996711
>         nonzeros in LSUB     4509612
> ** Memory Usage **********************************
> ** NUMfact space (MB): (sum-of-all-processes)
>     L\U :          952.18 |  Total :  1980.60
> ** Total highmark (MB):
>     Sum-of-all : 12401.85 | Avg :   387.56  | Max :   387.56
> **************************************************
> **************************************************
> **** Time (seconds) ****
>         EQUIL time             0.06
>         ROWPERM time           1.03
>         COLPERM time           1.01
>         SYMBFACT time          0.45
>         DISTRIBUTE time        0.33
>         FACTOR time            0.90
>         Factor flops    2.225916e+11    Mflops  247438.62
>         SOLVE time            0.000
> **************************************************
>  
> I tried all available ordering options for Colperm (NATURAL,MMD_AT_PLUS_A,MMD_ATA,METIS_AT_PLUS_A), save for parmetis which always crashes. For Rowperm I used NOROWPERM, LargeDiag_MC64. All gives the same seg. fault.
>  
>  
> Gesendet: Donnerstag, 29. Oktober 2020 um 14:14 Uhr
> Von: "Xiaoye S. Li" <xsli at lbl.gov>
> An: "Marius Buerkle" <mbuerkle at web.de>
> Cc: "Zhang, Hong" <hzhang at mcs.anl.gov>, "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>, "Sherry Li" <xiaoye at nersc.gov>
> Betreff: Re: Re: Re: [petsc-users] superlu_dist segfault
> Hong: thanks for the diagnosis!
>  
> Marius: how many OpenMP threads are you using per MPI task?
> In an earlier email, you mentioned the allocation failure at the following line:
>   if ( !(lsum = (doublecomplex*) SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex))))     ABORT("Malloc fails for lsum[].");
>  
> this is in the solve phase. I think when we do some OpenMP optimization, we allowed several data structures to grow with OpenMP threads.  You can try to use 1 thread.
> 
> The RHS and X  memories are easy to compute. However, in order to gauge how much memory is used in the factorization, can you print out the number of nonzeros in the L and U factors?   What ordering option are you using?  The sparse matrix A looks pretty small.
>  
> The code can also print out the working storage used during factorization.  I am not sure how this printing can be turned on through PETSc.
>  
> Sherry
>  
> On Wed, Oct 28, 2020 at 9:43 PM Marius Buerkle <mbuerkle at web.de <mailto:mbuerkle at web.de>> wrote:
> Thanks for the swift reply.
> 
> I also realized if I reduce the number of RHS then it works. But I am running the code on a cluster with 256GB ram / node.  One dense matrix would be around ~30 Gb so 60 Gb, which is large but does exceed the memory of even one node and I also get the seg fault if I run it on several nodes. Moreover, it works well with MUMPS and MKL_CPARDISO solver. The maxium memory used when using MUMPS is around 150 Gb during the solver phase but for SuperLU_dist it crashed even before reaching the solver phase. Could there be such a large difference in memory usage between SuperLu_dist and MUMPS ?
> 
>  
> best,
> 
> marius
> 
>  
> Gesendet: Donnerstag, 29. Oktober 2020 um 10:10 Uhr
> Von: "Zhang, Hong" <hzhang at mcs.anl.gov <mailto:hzhang at mcs.anl.gov>>
> An: "Marius Buerkle" <mbuerkle at web.de <mailto:mbuerkle at web.de>>
> Cc: "petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>" <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>, "Sherry Li" <xiaoye at nersc.gov <mailto:xiaoye at nersc.gov>>
> Betreff: Re: Re: [petsc-users] superlu_dist segfault
> Marius,
> I tested your code with petsc-release on my mac laptop using np=2 cores. I first tested a small matrix data file successfully. Then I switch to your data file and run out of memory, likely due to the dense matrices B and X. I got an error "Your system has run out of application memory" from my laptop.
>  
> The sparse matrix A has size 42549 by 42549. Your code creates dense matrices B and X with the same size -- a huge memory requirement!
> By replacing B and X with size 42549 by nrhs (nrhs =< 4000), I had the code run well with np=2. Note the error message you got 
> [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>  
> The modified code I used is attached.
> Hong
>  
> From: Marius Buerkle <mbuerkle at web.de <mailto:mbuerkle at web.de>>
> Sent: Tuesday, October 27, 2020 10:01 PM
> To: Zhang, Hong <hzhang at mcs.anl.gov <mailto:hzhang at mcs.anl.gov>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; Sherry Li <xiaoye at nersc.gov <mailto:xiaoye at nersc.gov>>
> Subject: Aw: Re: [petsc-users] superlu_dist segfault
>  
> Hi,
>  
> I recompiled PETSC with debug option, now I get a seg fault at a different position
>  
> [23]PETSC ERROR: ------------------------------------------------------------------------
> [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [23]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
> [23]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> on GNU/linux and Apple Mac OS X to find memory corruption errors
> [23]PETSC ERROR: likely location of problem given in stack below
> [23]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [23]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [23]PETSC ERROR:       INSTEAD the line number of the start of the function
> [23]PETSC ERROR:       is given.
> [23]PETSC ERROR: [23] SuperLU_DIST:pzgssvx line 242 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [23]PETSC ERROR: [23] MatMatSolve_SuperLU_DIST line 211 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c
> [23]PETSC ERROR: [23] MatMatSolve line 3466 /home/cdfmat_marius/prog/petsc/git/release/petsc/src/mat/interface/matrix.c
> [23]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [23]PETSC ERROR: Signal received
>  
> I  made a small reproducer. The matrix is a bit too big so I cannot attach it directly to the email, but I put it in the cloud
> https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw <https://1drv.ms/u/s!AqZsng1oUcKzjYxGMGHojLRG09Sf1A?e=7uHnmw>
>  
> Best,
> Marius
>  
>  
> Gesendet: Dienstag, 27. Oktober 2020 um 23:11 Uhr
> Von: "Zhang, Hong" <hzhang at mcs.anl.gov <mailto:hzhang at mcs.anl.gov>>
> An: "Marius Buerkle" <mbuerkle at web.de <mailto:mbuerkle at web.de>>, "petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>" <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>, "Sherry Li" <xiaoye at nersc.gov <mailto:xiaoye at nersc.gov>>
> Betreff: Re: [petsc-users] superlu_dist segfault
> Marius,
> It fails at the line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c
>     if ( !(lsum = (doublecomplex*)SUPERLU_MALLOC(sizelsum*num_thread * sizeof(doublecomplex))))     ABORT("Malloc fails for lsum[].");
>  
> We do not know what it means. You may use a debugger to check the values of the variables involved.
> I'm cc'ing Sherry (superlu_dist developer), or you may send us a stand-alone short code that reproduce the error. We can help on its investigation.
> Hong
>  
>  
> From: petsc-users <petsc-users-bounces at mcs.anl.gov <mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Marius Buerkle <mbuerkle at web.de <mailto:mbuerkle at web.de>>
> Sent: Tuesday, October 27, 2020 8:46 AM
> To: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Subject: [petsc-users] superlu_dist segfault
>  
> Hi,
>  
> When using MatMatSolve with superlu_dist I get a segmentation fault:
>  
> Malloc fails for lsum[]. at line 1075 in file /home/petsc3.14.release/arch-linux-c-debug/externalpackages/git.superlu_dist/SRC/pzgstrs.c
>  
> The matrix size is not particular big and I am using the petsc release branch and superlu_dist is v6.3.0 I think.
>  
> Best,
> Marius

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201030/63d06ba0/attachment.html>

From bsmith at petsc.dev  Fri Oct 30 14:56:29 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 30 Oct 2020 14:56:29 -0500
Subject: [petsc-users] Regarding changes in the 3.14 release
In-Reply-To: <CAOGsD9j9cUfsfF7qcHvUw_1_3=nKy2jpFnxu2VYD-9QZzjr=5w@mail.gmail.com>
References: <CAOGsD9ikh_i8n+YQAAESLUDjas7UwLM6PtgNUaq7D_rEvJG+Ug@mail.gmail.com>
	<CAMYG4G=b5tmrAQNv+9rZvEU6iaDj1z4hEn5pt6+DDxJy0-eXRg@mail.gmail.com>
	<CAOGsD9j9cUfsfF7qcHvUw_1_3=nKy2jpFnxu2VYD-9QZzjr=5w@mail.gmail.com>
Message-ID: <7B95D0D0-C162-423E-A465-A663AB6728D1@petsc.dev>


> On Oct 28, 2020, at 7:56 PM, Sajid Ali <sajidsyed2021 at u.northwestern.edu> wrote:
> 
> Hi Matt, 
> 
> Thanks for the clarification. The documentation <https://gitlab.com/petsc/petsc/-/blob/master/src/snes/interface/snes.c#L3304> for SNESSetLagPreconditioner states "If  -1 is used before the very first nonlinear solve the preconditioner is still built because there is no previous preconditioner to use" which was true prior to 3.14, is this statement no longer valid ?

   This looks like outdated information. We may have been less picky at one point. Will remove.


> 
> What is the difference between having -snes_lag_preconditioner -2 and having -snes_lag_preconditioner_persists true ? 

    -2 -1 persist through more nonlinear solves but if the number is positive a new preconditioner will be built for each zero iteration of the Newton solve. 

    persists means that the recompute (say every 2 iterations) is done across all the solves not each individually. 

    Say the lag is 2. And 2 newton steps are done in the first iteration then.   

    without persistence

    iter 0 total its 0  of first solve compute preconditioning
    iter 1              1        do not
          2              2        do
    it     0              3        do
           1              4        do not


    with persistence

    iter 0           0 e compute preconditioning
    iter 1            1          do not
          2            2          do
    it     0           3           do not
           1           4           do 

so with persistence it does the mod over the second column the total iterations without persistence it does over the local iteration (the normal way).

  Barry


> 
> PS :  The man pages for SNESSetLagJacobianPersists should perhaps not state the lag preconditioner options database keys and vice versa for clarity.
> 
> Thank You,
> Sajid Ali | PhD Candidate
> Applied Physics
> Northwestern University
> s-sajid-ali.github.io <http://s-sajid-ali.github.io/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201030/0b184c1f/attachment-0001.html>

From pranayreddy865 at gmail.com  Fri Oct 30 22:11:52 2020
From: pranayreddy865 at gmail.com (baikadi pranay)
Date: Fri, 30 Oct 2020 20:11:52 -0700
Subject: [petsc-users] Problem with SNESsetFunction()
Message-ID: <CA+zFCTk4_=x0Ojt4PiPbfbkfoc2gMtLXJUDk-OM-RDiWbw9EGw@mail.gmail.com>

Hello,
I have a couple of questions regarding SNESSetFunction usage, when
programming in Fortran90.

1) I have the following usage paradigm.
   call SNESSetFunction(snes,f_non,FormFunction,0,ierr)
   subroutine FormFunction(snes,x,r,dummy,ierr)
In the FormFunction subroutine, the function values are stored in the
vector r. I see that these values are formed correctly. But when I use
FormFunction in  SNESSetFunction(), the values are not getting populated
into f_non and all of the values in f_non are zero.
Should the name of the variable used to store the function value be same
in  SNESSetFunction and FormFunction? And should I be calling the
SNESComputeFunction() after calling SNESSetFunction()?

2) In the subroutine FormFunction, should the vector objects created be
destroyed before ending the subroutine?

Please let me know if you need any further information. Thank you in
advance.
Best regards,
Pranay.

?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201030/8817e0e5/attachment.html>

From bsmith at petsc.dev  Sat Oct 31 09:14:02 2020
From: bsmith at petsc.dev (Barry Smith)
Date: Sat, 31 Oct 2020 09:14:02 -0500
Subject: [petsc-users] Problem with SNESsetFunction()
In-Reply-To: <CA+zFCTk4_=x0Ojt4PiPbfbkfoc2gMtLXJUDk-OM-RDiWbw9EGw@mail.gmail.com>
References: <CA+zFCTk4_=x0Ojt4PiPbfbkfoc2gMtLXJUDk-OM-RDiWbw9EGw@mail.gmail.com>
Message-ID: <18F9C792-0B04-4393-98A2-96A551D5A082@petsc.dev>


> On Oct 30, 2020, at 10:11 PM, baikadi pranay <pranayreddy865 at gmail.com> wrote:
> 
> Hello,
> I have a couple of questions regarding SNESSetFunction usage, when programming in Fortran90.
> 
> 1) I have the following usage paradigm.
>    call SNESSetFunction(snes,f_non,FormFunction,0,ierr)
>    subroutine FormFunction(snes,x,r,dummy,ierr)
> In the FormFunction subroutine, the function values are stored in the vector r. I see that these values are formed correctly. But when I use FormFunction in  SNESSetFunction(), the values are not getting populated into f_non and all of the values in f_non are zero. 

> Should the name of the variable used to store the function value be same in  SNESSetFunction and FormFunction?

   It does not need to be the same, they are just the variables in each function

> And should I be calling the SNESComputeFunction() after calling SNESSetFunction()?

   No, that is a developer function called in PETSc, one would not need to call that.

> 
> 2) In the subroutine FormFunction, should the vector objects created be destroyed before ending the subroutine?

   What vectors? If you are creating any work vectors you need within the the FormFunction, yes those should be destroyed. But not the input and output functions.

   Here is any example from src/snes/tutorials/ex1f.F90 Note you call VecGetArrayF90() to access the arrays for the vectors, put the values into the arrays


subroutine FormFunction(snes,X,F,user,ierr)
      implicit none

!  Input/output variables:
      SNES           snes
      Vec            X,F
      PetscErrorCode ierr
      type (userctx) user
      DM             da

!  Declarations for use with local arrays:
      PetscScalar,pointer :: lx_v(:),lf_v(:)
      Vec            localX

!  Scatter ghost points to local vector, using the 2-step process
!     DMGlobalToLocalBegin(), DMGlobalToLocalEnd().
!  By placing code between these two statements, computations can
!  be done while messages are in transition.
      call SNESGetDM(snes,da,ierr);CHKERRQ(ierr)
      call DMGetLocalVector(da,localX,ierr);CHKERRQ(ierr)
      call DMGlobalToLocalBegin(da,X,INSERT_VALUES,localX,ierr);CHKERRQ(ierr)
      call DMGlobalToLocalEnd(da,X,INSERT_VALUES,localX,ierr);CHKERRQ(ierr)

!  Get a pointer to vector data.
!    - For default PETSc vectors, VecGetArray90() returns a pointer to
!      the data array. Otherwise, the routine is implementation dependent.
!    - You MUST call VecRestoreArrayF90() when you no longer need access to
!      the array.
!    - Note that the interface to VecGetArrayF90() differs from VecGetArray(),
!      and is useable from Fortran-90 Only.

      call VecGetArrayF90(localX,lx_v,ierr);CHKERRQ(ierr)
      call VecGetArrayF90(F,lf_v,ierr);CHKERRQ(ierr)

!  Compute function over the locally owned part of the grid
      call FormFunctionLocal(lx_v,lf_v,user,ierr);CHKERRQ(ierr)

!  Restore vectors
      call VecRestoreArrayF90(localX,lx_v,ierr);CHKERRQ(ierr)
      call VecRestoreArrayF90(F,lf_v,ierr);CHKERRQ(ierr)

!  Insert values into global vector

      call DMRestoreLocalVector(da,localX,ierr);CHKERRQ(ierr)
      call PetscLogFlops(11.0d0*user%ym*user%xm,ierr)

!      call VecView(X,PETSC_VIEWER_STDOUT_WORLD,ierr)
!      call VecView(F,PETSC_VIEWER_STDOUT_WORLD,ierr)
      return
      end subroutine formfunction
      end module f90module


> 
> Please let me know if you need any further information. Thank you in advance.
> Best regards,
> Pranay.
> 
> ?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201031/cea604b3/attachment.html>