From wence at gmx.li  Mon Feb  1 08:38:55 2021
From: wence at gmx.li (Lawrence Mitchell)
Date: Mon, 1 Feb 2021 14:38:55 +0000
Subject: [petsc-users] Faculty positions at Durham
Message-ID: <C4034F08-80F5-4863-97AC-B263AAD2A5BD@gmx.li>

Dear PETSc-ites,

The Durham CS department is presently hiring. We have open positions at all levels (Assistant, Associate, and Full Professor) across a broad range of applied computer science, and we'd like to make at least one hire in Scientific Computing (the group currently has interests in geometric machine learning, finite element methods, preconditioning, and high performance algorithms: see https://duscicomp.github.io for more details).

Assistant Prof: https://www.jobs.ac.uk/job/CDN843/assistant-professor-in-computer-science-comp21-51
Closing date 22nd February

Associate Prof: https://www.jobs.ac.uk/job/CDN924/associate-professor-in-computer-science-comp21-57
Closing date 8th March

Prof: https://www.jobs.ac.uk/job/CDN929/professor-in-computer-science-comp21-60
Closing date 22nd March

If you have any queries or would like to know more, please get in touch with me.

Thanks,

Lawrence

From uphoff at geophysik.uni-muenchen.de  Tue Feb  2 05:06:16 2021
From: uphoff at geophysik.uni-muenchen.de (Carsten Uphoff)
Date: Tue, 2 Feb 2021 12:06:16 +0100
Subject: [petsc-users] Discontinuous Galerkin and BDDC
Message-ID: <b3c04896-a0ec-2ae6-775d-0f505b1f9fd5@geophysik.uni-muenchen.de>

Hi everyone,

I'm interested in testing a BDDC preconditioner for Poisson and 
Elasticity equations using the symmetric interior penalty Galerkin 
method. However, I wonder how one would apply PCBDDC for discontinuous 
Galerkin.

The major problem is that for DG you cannot write the bilinear form as a 
sum of local bilinear forms which only involve degrees of freedom of the 
respective local subdomain. In particular, coupling terms at the 
interface of two subdomains, such as [[u]], require DOFs from two 
subdomains. Therefore, it is not straightforward to write the operator A 
as sum of local operators in the form
A = sum_{i=1}^N R_i A_i R_i^T
where A_i are local operators and R_i is the local-to-global map.

In the literature, I found two possible solutions:
- Double degrees of freedom at the subdomain interface [1]
- Split the bilinear form a_h in the two parts a_{h,D} and a_{h,C}, 
where the first leads to an easy-to-invert operator that is 
discontinuous across the subdomain interface, and the second is 
continuous across the subdomain interface. As a_{h,C} is continuous, one 
may write the bilinear form as sum of local bilinear forms only 
involving the local degrees of freedom [2]

The first approach [1] seems unattractive as you double the DOFs in the 
Schur complement. For [2] I think one might be able to apply PCBDDC on 
A_{h,C} and apply A_{h,D}^{-1} as an additive correction, cf. (2.23) in [2].

Questions:
- Is there any straightforward way to apply PCBDDC for DG which I am 
missing?
- Does it make sense to apply PCBDDC on A_{h,C}? Could I combine an 
additive correction with PCBDDC using PCCOMPOSITE, e.g.?
- Does anyone already test PCBDDC for DG?

I appreciate your help and I'm looking forward for your comments!

Best regards,
Carsten

[1] Dryja and Galvis and Sarkis, Numer. Math. 131:737-770, 2015, 
doi:10.1007/s00211-015-0705-x
[2] Brenner and Park and Sung, ETNA 46:190-214, 2017, 
http://etna.mcs.kent.edu/vol.46.2017/pp190-214.dir/pp190-214.pdf


From mfadams at lbl.gov  Tue Feb  2 14:17:52 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 2 Feb 2021 15:17:52 -0500
Subject: [petsc-users] Fortran initialization and XXXDestroy
Message-ID: <CADOhEh6wTj8CsL=U4mfQa-ptKhJP7BiWeh6uk8_UjjKA0_H66A@mail.gmail.com>

Satish, a few years ago you helped us transition the XGC Fortran code from
v3.7.7 and we seemed to have regressed.

As I recall we removed the initialization of Mats (for example) in XGC.
PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified
this today) and I recall that from our previous conversation. As I look at
the code now Fortran MatDestroy just goes straight to C, which
would explain our crashes when we MatDestroy an uninitialized (-2) Mat.

What is the correct way to delete with initializing Fortran objects?

Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210202/eaa7c18e/attachment.html>

From rlmackie862 at gmail.com  Tue Feb  2 14:27:51 2021
From: rlmackie862 at gmail.com (Randall Mackie)
Date: Tue, 2 Feb 2021 12:27:51 -0800
Subject: [petsc-users] Fortran initialization and XXXDestroy
In-Reply-To: <CADOhEh6wTj8CsL=U4mfQa-ptKhJP7BiWeh6uk8_UjjKA0_H66A@mail.gmail.com>
References: <CADOhEh6wTj8CsL=U4mfQa-ptKhJP7BiWeh6uk8_UjjKA0_H66A@mail.gmail.com>
Message-ID: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com>

Hi Mark,

I don?t know what the XGC code is, but the way I do this in my Fortran code is that I initialize all objects I later want to destroy, for example:

mat11=PETSC_NULL_MAT
vec1=PETSC_NULL_VEC

etc

Then I check and destroy like:

if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr)

etc.

Hope this helps,

Randy


> On Feb 2, 2021, at 12:17 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed.
> 
> As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat.
> 
> What is the correct way to delete with initializing Fortran objects?
> 
> Thanks,
> Mark
> 
> 


From mfadams at lbl.gov  Tue Feb  2 14:34:44 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 2 Feb 2021 15:34:44 -0500
Subject: [petsc-users] Fortran initialization and XXXDestroy
In-Reply-To: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com>
References: <CADOhEh6wTj8CsL=U4mfQa-ptKhJP7BiWeh6uk8_UjjKA0_H66A@mail.gmail.com>
	<61624598-BC78-4FEF-A423-88D672CE7788@gmail.com>
Message-ID: <CADOhEh6ozm-owBJBYxEau56LVzzGBX7vgA5q1Svtv7ueS+bFGA@mail.gmail.com>

Thanks Randy, that makes sense.
Mark

On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie <rlmackie862 at gmail.com> wrote:

> Hi Mark,
>
> I don?t know what the XGC code is, but the way I do this in my Fortran
> code is that I initialize all objects I later want to destroy, for example:
>
> mat11=PETSC_NULL_MAT
> vec1=PETSC_NULL_VEC
>
> etc
>
> Then I check and destroy like:
>
> if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr)
>
> etc.
>
> Hope this helps,
>
> Randy
>
>
> > On Feb 2, 2021, at 12:17 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > Satish, a few years ago you helped us transition the XGC Fortran code
> from v3.7.7 and we seemed to have regressed.
> >
> > As I recall we removed the initialization of Mats (for example) in XGC.
> PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified
> this today) and I recall that from our previous conversation. As I look at
> the code now Fortran MatDestroy just goes straight to C, which would
> explain our crashes when we MatDestroy an uninitialized (-2) Mat.
> >
> > What is the correct way to delete with initializing Fortran objects?
> >
> > Thanks,
> > Mark
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210202/1f3973a0/attachment.html>

From bsmith at petsc.dev  Tue Feb  2 20:37:00 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 2 Feb 2021 20:37:00 -0600
Subject: [petsc-users] Fortran initialization and XXXDestroy
In-Reply-To: <CADOhEh6ozm-owBJBYxEau56LVzzGBX7vgA5q1Svtv7ueS+bFGA@mail.gmail.com>
References: <CADOhEh6wTj8CsL=U4mfQa-ptKhJP7BiWeh6uk8_UjjKA0_H66A@mail.gmail.com>
	<61624598-BC78-4FEF-A423-88D672CE7788@gmail.com>
	<CADOhEh6ozm-owBJBYxEau56LVzzGBX7vgA5q1Svtv7ueS+bFGA@mail.gmail.com>
Message-ID: <3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev>


  I cannot remember why I selected -2 as the initial value for PETSc objects in Fortran. Probably because it would ensure a dramatic crash if you used 
an object without initializing it from Fortran.

  It could be changing 

config/BuildSystem/config/compilersFortran.py:      self.addDefine('FORTRAN_TYPE_INITIALIZE', ' = -2') 

to 0 would mean that if you called destroy on the object and never created it everything would be fine; so you would not need to use any special code to check.

Would that be a better model?

  Barry


> On Feb 2, 2021, at 2:34 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> Thanks Randy, that makes sense.
> Mark
> 
> On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie <rlmackie862 at gmail.com <mailto:rlmackie862 at gmail.com>> wrote:
> Hi Mark,
> 
> I don?t know what the XGC code is, but the way I do this in my Fortran code is that I initialize all objects I later want to destroy, for example:
> 
> mat11=PETSC_NULL_MAT
> vec1=PETSC_NULL_VEC
> 
> etc
> 
> Then I check and destroy like:
> 
> if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr)
> 
> etc.
> 
> Hope this helps,
> 
> Randy
> 
> 
> > On Feb 2, 2021, at 12:17 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> > 
> > Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed.
> > 
> > As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat.
> > 
> > What is the correct way to delete with initializing Fortran objects?
> > 
> > Thanks,
> > Mark
> > 
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210202/b280d880/attachment.html>

From balay at mcs.anl.gov  Tue Feb  2 22:31:16 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 2 Feb 2021 22:31:16 -0600
Subject: [petsc-users] Fortran initialization and XXXDestroy
In-Reply-To: <3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev>
References: <CADOhEh6wTj8CsL=U4mfQa-ptKhJP7BiWeh6uk8_UjjKA0_H66A@mail.gmail.com>
	<61624598-BC78-4FEF-A423-88D672CE7788@gmail.com>
	<CADOhEh6ozm-owBJBYxEau56LVzzGBX7vgA5q1Svtv7ueS+bFGA@mail.gmail.com>
	<3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev>
Message-ID: <d7d652e6-8dc2-63eb-23bd-bf9d1373f387@mcs.anl.gov>

I think the current code is to support:

if (var not initialized): crate(var).

And there might be some mapping of 0 to NULL or some other state (hence the choice of -2)

Satish

On Tue, 2 Feb 2021, Barry Smith wrote:

> 
>   I cannot remember why I selected -2 as the initial value for PETSc objects in Fortran. Probably because it would ensure a dramatic crash if you used 
> an object without initializing it from Fortran.
> 
>   It could be changing 
> 
> config/BuildSystem/config/compilersFortran.py:      self.addDefine('FORTRAN_TYPE_INITIALIZE', ' = -2') 
> 
> to 0 would mean that if you called destroy on the object and never created it everything would be fine; so you would not need to use any special code to check.
> 
> Would that be a better model?
> 
>   Barry
> 
> 
> 
> > On Feb 2, 2021, at 2:34 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > 
> > Thanks Randy, that makes sense.
> > Mark
> > 
> > On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie <rlmackie862 at gmail.com <mailto:rlmackie862 at gmail.com>> wrote:
> > Hi Mark,
> > 
> > I don?t know what the XGC code is, but the way I do this in my Fortran code is that I initialize all objects I later want to destroy, for example:
> > 
> > mat11=PETSC_NULL_MAT
> > vec1=PETSC_NULL_VEC
> > 
> > etc
> > 
> > Then I check and destroy like:
> > 
> > if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr)
> > 
> > etc.
> > 
> > Hope this helps,
> > 
> > Randy
> > 
> > 
> > > On Feb 2, 2021, at 12:17 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> > > 
> > > Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed.
> > > 
> > > As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat.
> > > 
> > > What is the correct way to delete with initializing Fortran objects?
> > > 
> > > Thanks,
> > > Mark
> > > 
> > > 
> > 
> 
> 

From stefano.zampini at gmail.com  Wed Feb  3 02:48:44 2021
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Wed, 3 Feb 2021 11:48:44 +0300
Subject: [petsc-users] Discontinuous Galerkin and BDDC
In-Reply-To: <b3c04896-a0ec-2ae6-775d-0f505b1f9fd5@geophysik.uni-muenchen.de>
References: <b3c04896-a0ec-2ae6-775d-0f505b1f9fd5@geophysik.uni-muenchen.de>
Message-ID: <CAGPUisg7ZdsDQGZm7HzuJso4zRzJzp8ayFV83X6Bghdo-a6EEQ@mail.gmail.com>

Il giorno mar 2 feb 2021 alle ore 14:06 Carsten Uphoff <
uphoff at geophysik.uni-muenchen.de> ha scritto:

> Hi everyone,
>
> I'm interested in testing a BDDC preconditioner for Poisson and
> Elasticity equations using the symmetric interior penalty Galerkin
> method. However, I wonder how one would apply PCBDDC for discontinuous
> Galerkin.
>
> The major problem is that for DG you cannot write the bilinear form as a
> sum of local bilinear forms which only involve degrees of freedom of the
> respective local subdomain. In particular, coupling terms at the
> interface of two subdomains, such as [[u]], require DOFs from two
> subdomains. Therefore, it is not straightforward to write the operator A
> as sum of local operators in the form
> A = sum_{i=1}^N R_i A_i R_i^T
> where A_i are local operators and R_i is the local-to-global map.
>
> In the literature, I found two possible solutions:
> - Double degrees of freedom at the subdomain interface [1]
> - Split the bilinear form a_h in the two parts a_{h,D} and a_{h,C},
> where the first leads to an easy-to-invert operator that is
> discontinuous across the subdomain interface, and the second is
> continuous across the subdomain interface. As a_{h,C} is continuous, one
> may write the bilinear form as sum of local bilinear forms only
> involving the local degrees of freedom [2]
>
> The first approach [1] seems unattractive as you double the DOFs in the
> Schur complement. For [2] I think one might be able to apply PCBDDC on
> A_{h,C} and apply A_{h,D}^{-1} as an additive correction, cf. (2.23) in
> [2].
>
> Questions:
> - Is there any straightforward way to apply PCBDDC for DG which I am
> missing?
>

I don't think so. I know Lawrence gave it some thoughts but never heard
about a final solution about how to represent subdomain DG matrices via a
MATIS.

>
> - Does it make sense to apply PCBDDC on A_{h,C}? Could I combine an
> additive correction with PCBDDC using PCCOMPOSITE, e.g.?
>

You either use PCComposite or write a small PCSHELL that implements PCApply
as additive combination


> - Does anyone already test PCBDDC for DG?
>

Not that I know


> I appreciate your help and I'm looking forward for your comments!
>
> Best regards,
> Carsten
>
> [1] Dryja and Galvis and Sarkis, Numer. Math. 131:737-770, 2015,
> doi:10.1007/s00211-015-0705-x
> [2] Brenner and Park and Sung, ETNA 46:190-214, 2017,
> http://etna.mcs.kent.edu/vol.46.2017/pp190-214.dir/pp190-214.pdf
>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/f217dfac/attachment.html>

From mfadams at lbl.gov  Wed Feb  3 06:07:44 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 3 Feb 2021 07:07:44 -0500
Subject: [petsc-users] Fortran initialization and XXXDestroy
In-Reply-To: <d7d652e6-8dc2-63eb-23bd-bf9d1373f387@mcs.anl.gov>
References: <CADOhEh6wTj8CsL=U4mfQa-ptKhJP7BiWeh6uk8_UjjKA0_H66A@mail.gmail.com>
	<61624598-BC78-4FEF-A423-88D672CE7788@gmail.com>
	<CADOhEh6ozm-owBJBYxEau56LVzzGBX7vgA5q1Svtv7ueS+bFGA@mail.gmail.com>
	<3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev>
	<d7d652e6-8dc2-63eb-23bd-bf9d1373f387@mcs.anl.gov>
Message-ID: <CADOhEh4B=-OsAsErhU9jBV6iBF38Pw0KmaPNf4=2f0+3emRodA@mail.gmail.com>

Perhaps a custom Fortran interface that checks for FORTRAN_TYPE_INITIALIZE.
That was the first thing that I looked for.

On Tue, Feb 2, 2021 at 11:31 PM Satish Balay <balay at mcs.anl.gov> wrote:

> I think the current code is to support:
>
> if (var not initialized): crate(var).
>
> And there might be some mapping of 0 to NULL or some other state (hence
> the choice of -2)
>
> Satish
>
> On Tue, 2 Feb 2021, Barry Smith wrote:
>
> >
> >   I cannot remember why I selected -2 as the initial value for PETSc
> objects in Fortran. Probably because it would ensure a dramatic crash if
> you used
> > an object without initializing it from Fortran.
> >
> >   It could be changing
> >
> > config/BuildSystem/config/compilersFortran.py:
> self.addDefine('FORTRAN_TYPE_INITIALIZE', ' = -2')
> >
> > to 0 would mean that if you called destroy on the object and never
> created it everything would be fine; so you would not need to use any
> special code to check.
> >
> > Would that be a better model?
> >
> >   Barry
> >
> >
> >
> > > On Feb 2, 2021, at 2:34 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > > Thanks Randy, that makes sense.
> > > Mark
> > >
> > > On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie <rlmackie862 at gmail.com
> <mailto:rlmackie862 at gmail.com>> wrote:
> > > Hi Mark,
> > >
> > > I don?t know what the XGC code is, but the way I do this in my Fortran
> code is that I initialize all objects I later want to destroy, for example:
> > >
> > > mat11=PETSC_NULL_MAT
> > > vec1=PETSC_NULL_VEC
> > >
> > > etc
> > >
> > > Then I check and destroy like:
> > >
> > > if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr)
> > >
> > > etc.
> > >
> > > Hope this helps,
> > >
> > > Randy
> > >
> > >
> > > > On Feb 2, 2021, at 12:17 PM, Mark Adams <mfadams at lbl.gov <mailto:
> mfadams at lbl.gov>> wrote:
> > > >
> > > > Satish, a few years ago you helped us transition the XGC Fortran
> code from v3.7.7 and we seemed to have regressed.
> > > >
> > > > As I recall we removed the initialization of Mats (for example) in
> XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed,
> verified this today) and I recall that from our previous conversation. As I
> look at the code now Fortran MatDestroy just goes straight to C, which
> would explain our crashes when we MatDestroy an uninitialized (-2) Mat.
> > > >
> > > > What is the correct way to delete with initializing Fortran objects?
> > > >
> > > > Thanks,
> > > > Mark
> > > >
> > > >
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/14c32e7e/attachment.html>

From jed at jedbrown.org  Wed Feb  3 10:14:19 2021
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 03 Feb 2021 09:14:19 -0700
Subject: [petsc-users] Discontinuous Galerkin and BDDC
In-Reply-To: <CAGPUisg7ZdsDQGZm7HzuJso4zRzJzp8ayFV83X6Bghdo-a6EEQ@mail.gmail.com>
References: <b3c04896-a0ec-2ae6-775d-0f505b1f9fd5@geophysik.uni-muenchen.de>
	<CAGPUisg7ZdsDQGZm7HzuJso4zRzJzp8ayFV83X6Bghdo-a6EEQ@mail.gmail.com>
Message-ID: <87pn1huo10.fsf@jedbrown.org>

Stefano Zampini <stefano.zampini at gmail.com> writes:

> Il giorno mar 2 feb 2021 alle ore 14:06 Carsten Uphoff <
> uphoff at geophysik.uni-muenchen.de> ha scritto:
>> Questions:
>> - Is there any straightforward way to apply PCBDDC for DG which I am
>> missing?
>>
>
> I don't think so. I know Lawrence gave it some thoughts but never heard
> about a final solution about how to represent subdomain DG matrices via a
> MATIS.

Wouldn't hybridizing work pretty naturally? The dofs are all in a broken trace space and connect through elements with shared support.

From wence at gmx.li  Wed Feb  3 10:24:49 2021
From: wence at gmx.li (Lawrence Mitchell)
Date: Wed, 3 Feb 2021 16:24:49 +0000
Subject: [petsc-users] Discontinuous Galerkin and BDDC
In-Reply-To: <CAGPUisg7ZdsDQGZm7HzuJso4zRzJzp8ayFV83X6Bghdo-a6EEQ@mail.gmail.com>
References: <b3c04896-a0ec-2ae6-775d-0f505b1f9fd5@geophysik.uni-muenchen.de>
	<CAGPUisg7ZdsDQGZm7HzuJso4zRzJzp8ayFV83X6Bghdo-a6EEQ@mail.gmail.com>
Message-ID: <F38BEFFA-7E50-474E-B12B-9706B1B23722@gmx.li>


> On 3 Feb 2021, at 08:48, Stefano Zampini <stefano.zampini at gmail.com> wrote:
> 

...
> Questions:
> - Is there any straightforward way to apply PCBDDC for DG which I am 
> missing?
> 
> I don't think so. I know Lawrence gave it some thoughts but never heard about a final solution about how to represent subdomain DG matrices via a MATIS.

I think when we discussed this (two+ years ago?), I was just worried about what I was doing. But I never got any further along.

I think Jed's suggestion sounds like it would work.

Lawrence

From bsmith at petsc.dev  Wed Feb  3 11:01:16 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 3 Feb 2021 11:01:16 -0600
Subject: [petsc-users] Fortran initialization and XXXDestroy
In-Reply-To: <CADOhEh4B=-OsAsErhU9jBV6iBF38Pw0KmaPNf4=2f0+3emRodA@mail.gmail.com>
References: <CADOhEh6wTj8CsL=U4mfQa-ptKhJP7BiWeh6uk8_UjjKA0_H66A@mail.gmail.com>
	<61624598-BC78-4FEF-A423-88D672CE7788@gmail.com>
	<CADOhEh6ozm-owBJBYxEau56LVzzGBX7vgA5q1Svtv7ueS+bFGA@mail.gmail.com>
	<3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev>
	<d7d652e6-8dc2-63eb-23bd-bf9d1373f387@mcs.anl.gov>
	<CADOhEh4B=-OsAsErhU9jBV6iBF38Pw0KmaPNf4=2f0+3emRodA@mail.gmail.com>
Message-ID: <5BB4D9CF-D4A6-43D3-AC5C-FBB6F67149DC@petsc.dev>


> On Feb 3, 2021, at 6:07 AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> Perhaps a custom Fortran interface that checks for FORTRAN_TYPE_INITIALIZE. 
> That was the first thing that I looked for.

   This would mean tons more custom interfaces which are a pain to write and often forgotten. I think someone should just try changing to 0.

    Note we can still have a check if not initialized and a check if initialized for users (not that they should use them much).

Barry

> 
> On Tue, Feb 2, 2021 at 11:31 PM Satish Balay <balay at mcs.anl.gov <mailto:balay at mcs.anl.gov>> wrote:
> I think the current code is to support:
> 
> if (var not initialized): crate(var).
> 
> And there might be some mapping of 0 to NULL or some other state (hence the choice of -2)
> 
> Satish
> 
> On Tue, 2 Feb 2021, Barry Smith wrote:
> 
> > 
> >   I cannot remember why I selected -2 as the initial value for PETSc objects in Fortran. Probably because it would ensure a dramatic crash if you used 
> > an object without initializing it from Fortran.
> > 
> >   It could be changing 
> > 
> > config/BuildSystem/config/compilersFortran.py:      self.addDefine('FORTRAN_TYPE_INITIALIZE', ' = -2') 
> > 
> > to 0 would mean that if you called destroy on the object and never created it everything would be fine; so you would not need to use any special code to check.
> > 
> > Would that be a better model?
> > 
> >   Barry
> > 
> > 
> > 
> > > On Feb 2, 2021, at 2:34 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> > > 
> > > Thanks Randy, that makes sense.
> > > Mark
> > > 
> > > On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie <rlmackie862 at gmail.com <mailto:rlmackie862 at gmail.com> <mailto:rlmackie862 at gmail.com <mailto:rlmackie862 at gmail.com>>> wrote:
> > > Hi Mark,
> > > 
> > > I don?t know what the XGC code is, but the way I do this in my Fortran code is that I initialize all objects I later want to destroy, for example:
> > > 
> > > mat11=PETSC_NULL_MAT
> > > vec1=PETSC_NULL_VEC
> > > 
> > > etc
> > > 
> > > Then I check and destroy like:
> > > 
> > > if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr)
> > > 
> > > etc.
> > > 
> > > Hope this helps,
> > > 
> > > Randy
> > > 
> > > 
> > > > On Feb 2, 2021, at 12:17 PM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov> <mailto:mfadams at lbl.gov <mailto:mfadams at lbl.gov>>> wrote:
> > > > 
> > > > Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed.
> > > > 
> > > > As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat.
> > > > 
> > > > What is the correct way to delete with initializing Fortran objects?
> > > > 
> > > > Thanks,
> > > > Mark
> > > > 
> > > > 
> > > 
> > 
> > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/2aa83e85/attachment-0001.html>

From skavou1 at lsu.edu  Wed Feb  3 11:07:34 2021
From: skavou1 at lsu.edu (Sepideh Kavousi)
Date: Wed, 3 Feb 2021 17:07:34 +0000
Subject: [petsc-users] SNES-norm is zero all the time
Message-ID: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>

Hello,
I have a very stupid problem that I am really ashamed of asking but it has been with me for days and I do not know what to do. I want to solve the Javier stokes equation with finite difference method. When I wanted to run the code, the snes norm value does not change and i get an error for timestep convergence. I thought I might do something wrong so,  I tried to simplify the equation I want to solve to an easy form, given as:
u_t=u_x +u_y
Where _t is the time derivative and _x and _y are the derivative in x and y direction. When I want to solve this problem, it still does not do anything at all and the snes function norm is zero all the time. I know I am missing something but does anyone have any idea what should I check in my code.  The answers does not change with time.
Best,
Sepideh


Get Outlook for iOS<https://aka.ms/o0ukef>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/1838b5ed/attachment.html>

From stefano.zampini at gmail.com  Wed Feb  3 11:11:33 2021
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Wed, 3 Feb 2021 20:11:33 +0300
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
Message-ID: <CAGPUishOy24qPDVAVQt5g+O9VWiONOy_HS43-332qp8piE4XRg@mail.gmail.com>

Cannot say anything if you don't provide a minimal code we can run to
reproduce the issue

Il giorno mer 3 feb 2021 alle ore 20:07 Sepideh Kavousi <skavou1 at lsu.edu>
ha scritto:

> Hello,
> I have a very stupid problem that I am really ashamed of asking but it has
> been with me for days and I do not know what to do. I want to solve the
> Javier stokes equation with finite difference method. When I wanted to run
> the code, the snes norm value does not change and i get an error for
> timestep convergence. I thought I might do something wrong so,  I tried
> to simplify the equation I want to solve to an easy form, given as:
> u_t=u_x +u_y
> Where _t is the time derivative and _x and _y are the derivative in x and
> y direction. When I want to solve this problem, it still does not do
> anything at all and the snes function norm is zero all the time. I know I
> am missing something but does anyone have any idea what should I check in
> my code.  The answers does not change with time.
> Best,
> Sepideh
>
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>


-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/b6aad2ec/attachment.html>

From patrick.sanan at gmail.com  Wed Feb  3 11:12:36 2021
From: patrick.sanan at gmail.com (Patrick Sanan)
Date: Wed, 3 Feb 2021 18:12:36 +0100
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <CAGPUishOy24qPDVAVQt5g+O9VWiONOy_HS43-332qp8piE4XRg@mail.gmail.com>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
	<CAGPUishOy24qPDVAVQt5g+O9VWiONOy_HS43-332qp8piE4XRg@mail.gmail.com>
Message-ID: <E1288DD2-FF1D-41D0-B8CA-D2660A34600A@gmail.com>

Are you working from a particular example, or writing your own code from scratch?

> Am 03.02.2021 um 18:11 schrieb Stefano Zampini <stefano.zampini at gmail.com>:
> 
> Cannot say anything if you don't provide a minimal code we can run to reproduce the issue
> 
> Il giorno mer 3 feb 2021 alle ore 20:07 Sepideh Kavousi <skavou1 at lsu.edu <mailto:skavou1 at lsu.edu>> ha scritto:
> Hello,
> I have a very stupid problem that I am really ashamed of asking but it has been with me for days and I do not know what to do. I want to solve the Javier stokes equation with finite difference method. When I wanted to run the code, the snes norm value does not change and i get an error for timestep convergence. I thought I might do something wrong so,  I tried to simplify the equation I want to solve to an easy form, given as:
> u_t=u_x +u_y
> Where _t is the time derivative and _x and _y are the derivative in x and y direction. When I want to solve this problem, it still does not do anything at all and the snes function norm is zero all the time. I know I am missing something but does anyone have any idea what should I check in my code.  The answers does not change with time.
> Best,
> Sepideh
> 
> 
> Get Outlook for iOS <https://aka.ms/o0ukef>
> 
> -- 
> Stefano

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/d8b53e22/attachment.html>

From jed at jedbrown.org  Wed Feb  3 11:19:54 2021
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 03 Feb 2021 10:19:54 -0700
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
Message-ID: <87h7mtukzp.fsf@jedbrown.org>

Sepideh Kavousi <skavou1 at lsu.edu> writes:

> Hello,
> I have a very stupid problem that I am really ashamed of asking but it has been with me for days and I do not know what to do. I want to solve the Javier stokes equation with finite difference method. 

Could you run with -snes_monitor -snes_converged_reason -pc_type lu?

> When I wanted to run the code, the snes norm value does not change and i get an error for timestep convergence. I thought I might do something wrong so, I tried to simplify the equation I want to solve to an easy form, given as: u_t=u_x +u_y Where _t is the time derivative and _x and _y are the derivative in x and y direction. When I want to solve this problem, it still does not do anything at all and the snes function norm is zero all the time. I know I am missing something but does anyone have any idea what should I check in my code. The answers does not change with time. Best, Sepideh
>
>
> Get Outlook for iOS<https://aka.ms/o0ukef>

From balay at mcs.anl.gov  Wed Feb  3 11:34:49 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 3 Feb 2021 11:34:49 -0600
Subject: [petsc-users] petsc-3.14.4 now available
Message-ID: <b3bb2bcd-6667-b98d-8a70-5a4c5e4159f@mcs.anl.gov>

Dear PETSc users,

The patch release petsc-3.14.4 is now available for download.

http://www.mcs.anl.gov/petsc/download/index.html

Satish


From skavou1 at lsu.edu  Wed Feb  3 11:39:26 2021
From: skavou1 at lsu.edu (Sepideh Kavousi)
Date: Wed, 3 Feb 2021 17:39:26 +0000
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <87h7mtukzp.fsf@jedbrown.org>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>,
	<87h7mtukzp.fsf@jedbrown.org>
Message-ID: <BN7PR06MB60673AD624072EFA98B3D964EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>

I am not running an specific example. Attached is  my code. and when I wun with
./step5.out -snes_monitor -snes_fd_color -ts_monitor -snes_converged_reason -pc_type lu

it seems it does not solve anything because the output is like:

    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
1 TS dt 0.005 time 0.005
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
2 TS dt 0.005 time 0.01
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
3 TS dt 0.005 time 0.015
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
4 TS dt 0.005 time 0.02
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
5 TS dt 0.005 time 0.025
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
6 TS dt 0.005 time 0.03
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
7 TS dt 0.005 time 0.035
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
8 TS dt 0.005 time 0.04
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
9 TS dt 0.005 time 0.045
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
10 TS dt 0.005 time 0.05
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
11 TS dt 0.005 time 0.055
copy!
    0 SNES Function norm 0.000000000000e+00
  Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
.......
________________________________
From: Jed Brown <jed at jedbrown.org>
Sent: Wednesday, February 3, 2021 11:19 AM
To: Sepideh Kavousi <skavou1 at lsu.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SNES-norm is zero all the time

Sepideh Kavousi <skavou1 at lsu.edu> writes:

> Hello,
> I have a very stupid problem that I am really ashamed of asking but it has been with me for days and I do not know what to do. I want to solve the Javier stokes equation with finite difference method.

Could you run with -snes_monitor -snes_converged_reason -pc_type lu?

> When I wanted to run the code, the snes norm value does not change and i get an error for timestep convergence. I thought I might do something wrong so, I tried to simplify the equation I want to solve to an easy form, given as: u_t=u_x +u_y Where _t is the time derivative and _x and _y are the derivative in x and y direction. When I want to solve this problem, it still does not do anything at all and the snes function norm is zero all the time. I know I am missing something but does anyone have any idea what should I check in my code. The answers does not change with time. Best, Sepideh
>
>
> Get Outlook for iOS<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fo0ukef&amp;data=04%7C01%7Cskavou1%40lsu.edu%7C07c0cc86cc4d4100e28a08d8c8680396%7C2d4dad3f50ae47d983a09ae2b1f466f8%7C0%7C0%7C637479696353544458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=A7xZE31UylQvnMnTFIcZj60GToFHW%2FsumLS9kIISEWo%3D&amp;reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/75bacb85/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: common-step5.c
Type: text/x-csrc
Size: 584 bytes
Desc: common-step5.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/75bacb85/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: common-step5.h
Type: text/x-chdr
Size: 742 bytes
Desc: common-step5.h
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/75bacb85/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile
Type: application/octet-stream
Size: 564 bytes
Desc: makefile
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/75bacb85/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: step5.c
Type: text/x-csrc
Size: 7292 bytes
Desc: step5.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/75bacb85/attachment-0002.bin>

From knepley at gmail.com  Wed Feb  3 11:46:24 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 3 Feb 2021 12:46:24 -0500
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <BN7PR06MB60673AD624072EFA98B3D964EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
	<87h7mtukzp.fsf@jedbrown.org>
	<BN7PR06MB60673AD624072EFA98B3D964EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
Message-ID: <CAMYG4GkzWGg6r+Vai8GZNdCVVNd985eO5aHZ_0ZiHqdDn8z=6Q@mail.gmail.com>

On Wed, Feb 3, 2021 at 12:39 PM Sepideh Kavousi <skavou1 at lsu.edu> wrote:

> I am not running an specific example. Attached is  my code. and when I wun
> with
> ./step5.out -snes_monitor -snes_fd_color -ts_monitor
> -snes_converged_reason -pc_type lu
>

Did you mean [i][j] here?

    aF[j][j].vx=((vx_x+vx_y)-1*aYdot[j][i].vx*(1));

    Matt


> it seems it does not solve anything because the output is like:
>
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 1 TS dt 0.005 time 0.005
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 2 TS dt 0.005 time 0.01
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 3 TS dt 0.005 time 0.015
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 4 TS dt 0.005 time 0.02
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 5 TS dt 0.005 time 0.025
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 6 TS dt 0.005 time 0.03
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 7 TS dt 0.005 time 0.035
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 8 TS dt 0.005 time 0.04
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 9 TS dt 0.005 time 0.045
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 10 TS dt 0.005 time 0.05
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 11 TS dt 0.005 time 0.055
> copy!
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> .......
> ------------------------------
> *From:* Jed Brown <jed at jedbrown.org>
> *Sent:* Wednesday, February 3, 2021 11:19 AM
> *To:* Sepideh Kavousi <skavou1 at lsu.edu>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] SNES-norm is zero all the time
>
> Sepideh Kavousi <skavou1 at lsu.edu> writes:
>
> > Hello,
> > I have a very stupid problem that I am really ashamed of asking but it
> has been with me for days and I do not know what to do. I want to solve the
> Javier stokes equation with finite difference method.
>
> Could you run with -snes_monitor -snes_converged_reason -pc_type lu?
>
> > When I wanted to run the code, the snes norm value does not change and i
> get an error for timestep convergence. I thought I might do something wrong
> so, I tried to simplify the equation I want to solve to an easy form, given
> as: u_t=u_x +u_y Where _t is the time derivative and _x and _y are the
> derivative in x and y direction. When I want to solve this problem, it
> still does not do anything at all and the snes function norm is zero all
> the time. I know I am missing something but does anyone have any idea what
> should I check in my code. The answers does not change with time. Best,
> Sepideh
> >
> >
> > Get Outlook for iOS<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fo0ukef&amp;data=04%7C01%7Cskavou1%40lsu.edu%7C07c0cc86cc4d4100e28a08d8c8680396%7C2d4dad3f50ae47d983a09ae2b1f466f8%7C0%7C0%7C637479696353544458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=A7xZE31UylQvnMnTFIcZj60GToFHW%2FsumLS9kIISEWo%3D&amp;reserved=0
> >
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/ad4d2159/attachment-0001.html>

From jed at jedbrown.org  Wed Feb  3 11:46:35 2021
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 03 Feb 2021 10:46:35 -0700
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <BN7PR06MB60673AD624072EFA98B3D964EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
	<87h7mtukzp.fsf@jedbrown.org>
	<BN7PR06MB60673AD624072EFA98B3D964EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
Message-ID: <87eehxujr8.fsf@jedbrown.org>

Sepideh Kavousi <skavou1 at lsu.edu> writes:

> I am not running an specific example. Attached is  my code. and when I wun with
> ./step5.out -snes_monitor -snes_fd_color -ts_monitor -snes_converged_reason -pc_type lu
>
> it seems it does not solve anything because the output is like:
>
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 1 TS dt 0.005 time 0.005

Check your FormFunction for why af[][] is zero. I see

  aF[j][j].vx

but you'll need to set at all the grid points, i.e., aF[j][i].vx and aF[j][i].vy.

From skavou1 at lsu.edu  Wed Feb  3 12:02:45 2021
From: skavou1 at lsu.edu (Sepideh Kavousi)
Date: Wed, 3 Feb 2021 18:02:45 +0000
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <87eehxujr8.fsf@jedbrown.org>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
	<87h7mtukzp.fsf@jedbrown.org>
	<BN7PR06MB60673AD624072EFA98B3D964EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>,
	<87eehxujr8.fsf@jedbrown.org>
Message-ID: <BN7PR06MB6067E1F62B00777FDD2D5DF1EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>

I only have one field "names vx" and this variable change in both x and y directions. I have also chosen dof in DMDACreate2d to "1".

I am not sure why I should have aF[i][j].vx. "i" defines the grids in x direction and "j" is in y-directions. In all my previous codes I define" aF[j][i].vx" and not "aF[i][j].vx", and it was working properly.
Best,
Sepideh
________________________________
From: Jed Brown <jed at jedbrown.org>
Sent: Wednesday, February 3, 2021 11:46 AM
To: Sepideh Kavousi <skavou1 at lsu.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SNES-norm is zero all the time

Sepideh Kavousi <skavou1 at lsu.edu> writes:

> I am not running an specific example. Attached is  my code. and when I wun with
> ./step5.out -snes_monitor -snes_fd_color -ts_monitor -snes_converged_reason -pc_type lu
>
> it seems it does not solve anything because the output is like:
>
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 1 TS dt 0.005 time 0.005

Check your FormFunction for why af[][] is zero. I see

  aF[j][j].vx

but you'll need to set at all the grid points, i.e., aF[j][i].vx and aF[j][i].vy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/60fbe854/attachment.html>

From knepley at gmail.com  Wed Feb  3 12:14:09 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 3 Feb 2021 13:14:09 -0500
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <BN7PR06MB6067E1F62B00777FDD2D5DF1EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
	<87h7mtukzp.fsf@jedbrown.org>
	<BN7PR06MB60673AD624072EFA98B3D964EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
	<87eehxujr8.fsf@jedbrown.org>
	<BN7PR06MB6067E1F62B00777FDD2D5DF1EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
Message-ID: <CAMYG4GnKsoWmd54CY2JzNCZ2zfKz-j-DVJiS1D7EfbcTUxpomA@mail.gmail.com>

On Wed, Feb 3, 2021 at 1:03 PM Sepideh Kavousi <skavou1 at lsu.edu> wrote:

> I only have one field "names vx" and this variable change in both x and y
> directions. I have also chosen dof in DMDACreate2d to "1".
>
> I am not sure why I should have aF[i][j].vx. "i" defines the grids in x
> direction and "j" is in y-directions. In all my previous codes I define"
> aF[j][i].vx" and not "aF[i][j].vx", and it was working properly.
>

To me, it looks like you have "jj"

  Matt


> Best,
> Sepideh
> ------------------------------
> *From:* Jed Brown <jed at jedbrown.org>
> *Sent:* Wednesday, February 3, 2021 11:46 AM
> *To:* Sepideh Kavousi <skavou1 at lsu.edu>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] SNES-norm is zero all the time
>
> Sepideh Kavousi <skavou1 at lsu.edu> writes:
>
> > I am not running an specific example. Attached is  my code. and when I
> wun with
> > ./step5.out -snes_monitor -snes_fd_color -ts_monitor
> -snes_converged_reason -pc_type lu
> >
> > it seems it does not solve anything because the output is like:
> >
> >     0 SNES Function norm 0.000000000000e+00
> >   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> > 1 TS dt 0.005 time 0.005
>
> Check your FormFunction for why af[][] is zero. I see
>
>   aF[j][j].vx
>
> but you'll need to set at all the grid points, i.e., aF[j][i].vx and
> aF[j][i].vy.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/7c4eeafc/attachment.html>

From skavou1 at lsu.edu  Wed Feb  3 12:28:03 2021
From: skavou1 at lsu.edu (Sepideh Kavousi)
Date: Wed, 3 Feb 2021 18:28:03 +0000
Subject: [petsc-users] SNES-norm is zero all the time
In-Reply-To: <CAMYG4GnKsoWmd54CY2JzNCZ2zfKz-j-DVJiS1D7EfbcTUxpomA@mail.gmail.com>
References: <BN7PR06MB6067A6EADE5B515870626812EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
	<87h7mtukzp.fsf@jedbrown.org>
	<BN7PR06MB60673AD624072EFA98B3D964EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>
	<87eehxujr8.fsf@jedbrown.org>
	<BN7PR06MB6067E1F62B00777FDD2D5DF1EEB49@BN7PR06MB6067.namprd06.prod.outlook.com>,
	<CAMYG4GnKsoWmd54CY2JzNCZ2zfKz-j-DVJiS1D7EfbcTUxpomA@mail.gmail.com>
Message-ID: <BN7PR06MB6067976FFBC2C6A2D631E88CEEB49@BN7PR06MB6067.namprd06.prod.outlook.com>

Oh my god, it was a bad one. Thanks for helping.
Sepideh
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Wednesday, February 3, 2021 12:14 PM
To: Sepideh Kavousi <skavou1 at lsu.edu>
Cc: Jed Brown <jed at jedbrown.org>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] SNES-norm is zero all the time

On Wed, Feb 3, 2021 at 1:03 PM Sepideh Kavousi <skavou1 at lsu.edu<mailto:skavou1 at lsu.edu>> wrote:
I only have one field "names vx" and this variable change in both x and y directions. I have also chosen dof in DMDACreate2d to "1".

I am not sure why I should have aF[i][j].vx. "i" defines the grids in x direction and "j" is in y-directions. In all my previous codes I define" aF[j][i].vx" and not "aF[i][j].vx", and it was working properly.

To me, it looks like you have "jj"

  Matt

Best,
Sepideh
________________________________
From: Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>>
Sent: Wednesday, February 3, 2021 11:46 AM
To: Sepideh Kavousi <skavou1 at lsu.edu<mailto:skavou1 at lsu.edu>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] SNES-norm is zero all the time

Sepideh Kavousi <skavou1 at lsu.edu<mailto:skavou1 at lsu.edu>> writes:

> I am not running an specific example. Attached is  my code. and when I wun with
> ./step5.out -snes_monitor -snes_fd_color -ts_monitor -snes_converged_reason -pc_type lu
>
> it seems it does not solve anything because the output is like:
>
>     0 SNES Function norm 0.000000000000e+00
>   Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> 1 TS dt 0.005 time 0.005

Check your FormFunction for why af[][] is zero. I see

  aF[j][j].vx

but you'll need to set at all the grid points, i.e., aF[j][i].vx and aF[j][i].vy.


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<https://nam04.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7Cskavou1%40lsu.edu%7C9aa72ef56f15478b6e0e08d8c86f8841%7C2d4dad3f50ae47d983a09ae2b1f466f8%7C0%7C0%7C637479728645895019%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CcQ%2BasyQnfdLmGdPbqBospHzu6DPt6sXte4x%2BHp8e%2F8%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/ffce3432/attachment-0001.html>

From luciano.siqueira at usp.br  Wed Feb  3 13:41:10 2021
From: luciano.siqueira at usp.br (Luciano Siqueira)
Date: Wed, 3 Feb 2021 16:41:10 -0300
Subject: [petsc-users] Slower performance in multi-node system
Message-ID: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br>

Hello,

I'm evaluating the performance of an application in a distributed 
environment and I notice that it's much slower when running in many 
nodes/cores when compared to a single node with a fewer cores.

When running the application in 20 nodes, the Main Stage time reported 
in PETSc's log is up to 10 times slower than it is when running the same 
application in only 1 node, even with fewer cores per node.

The application I'm running is an example code provided by libmesh:

http://libmesh.github.io/examples/introduction_ex4.html

The application runs inside a Singularity container, with openmpi-4.0.3 
and PETSc 3.14.3. The distributed processes are managed by slurm 
17.02.11 and each node is equipped with two Intel CPU Xeon E5-2695v2 Ivy 
Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going through 
infiniband.

My questions are: Is the slowdown expected? Should the application be 
specially tailored to work well in distributed environments?

Also, where (maybe in PETSc documentation/source-code) can I find 
information on how PETSc handles MPI communications? Do the KSP solvers 
favor one-to-one process communication over broadcast messages or 
vice-versa? I suspect inter-process communication must be the cause of 
the poor performance when using many nodes, but not as much as I'm seeing.

Thank you in advance!

Luciano.


From knepley at gmail.com  Wed Feb  3 13:43:54 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 3 Feb 2021 14:43:54 -0500
Subject: [petsc-users] Slower performance in multi-node system
In-Reply-To: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br>
References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br>
Message-ID: <CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com>

On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira <luciano.siqueira at usp.br>
wrote:

> Hello,
>
> I'm evaluating the performance of an application in a distributed
> environment and I notice that it's much slower when running in many
> nodes/cores when compared to a single node with a fewer cores.
>
> When running the application in 20 nodes, the Main Stage time reported
> in PETSc's log is up to 10 times slower than it is when running the same
> application in only 1 node, even with fewer cores per node.
>
> The application I'm running is an example code provided by libmesh:
>
> http://libmesh.github.io/examples/introduction_ex4.html
>
> The application runs inside a Singularity container, with openmpi-4.0.3
> and PETSc 3.14.3. The distributed processes are managed by slurm
> 17.02.11 and each node is equipped with two Intel CPU Xeon E5-2695v2 Ivy
> Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going through
> infiniband.
>
> My questions are: Is the slowdown expected? Should the application be
> specially tailored to work well in distributed environments?
>
> Also, where (maybe in PETSc documentation/source-code) can I find
> information on how PETSc handles MPI communications? Do the KSP solvers
> favor one-to-one process communication over broadcast messages or
> vice-versa? I suspect inter-process communication must be the cause of
> the poor performance when using many nodes, but not as much as I'm seeing.
>
> Thank you in advance!
>

We can't say anything about the performance without some data. Please send
us the output
of -log_view for both cases.

  Thanks,

     Matt


> Luciano.
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/c8be7dd0/attachment.html>

From luciano.siqueira at usp.br  Wed Feb  3 14:40:04 2021
From: luciano.siqueira at usp.br (Luciano Siqueira)
Date: Wed, 3 Feb 2021 17:40:04 -0300
Subject: [petsc-users] Slower performance in multi-node system
In-Reply-To: <CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com>
References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br>
	<CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com>
Message-ID: <ffbc9b4b-9475-4a78-b011-381c355572c1@usp.br>

Here are the (attached) output of -log_view for both cases. The 
beginning of the files has some info from the libmesh app.

Running in 1 node, 32 cores: 01_node_log_view.txt

Running in 20 nodes, 32 cores each (640 cores in total): 
01_node_log_view.txt

Thanks!

Luciano.

Em 03/02/2021 16:43, Matthew Knepley escreveu:
> On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira 
> <luciano.siqueira at usp.br <mailto:luciano.siqueira at usp.br>> wrote:
>
>     Hello,
>
>     I'm evaluating the performance of an application in a distributed
>     environment and I notice that it's much slower when running in many
>     nodes/cores when compared to a single node with a fewer cores.
>
>     When running the application in 20 nodes, the Main Stage time
>     reported
>     in PETSc's log is up to 10 times slower than it is when running
>     the same
>     application in only 1 node, even with fewer cores per node.
>
>     The application I'm running is an example code provided by libmesh:
>
>     http://libmesh.github.io/examples/introduction_ex4.html
>
>     The application runs inside a Singularity container, with
>     openmpi-4.0.3
>     and PETSc 3.14.3. The distributed processes are managed by slurm
>     17.02.11 and each node is equipped with two Intel CPU Xeon
>     E5-2695v2 Ivy
>     Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going
>     through
>     infiniband.
>
>     My questions are: Is the slowdown expected? Should the application be
>     specially tailored to work well in distributed environments?
>
>     Also, where (maybe in PETSc documentation/source-code) can I find
>     information on how PETSc handles MPI communications? Do the KSP
>     solvers
>     favor one-to-one process communication over broadcast messages or
>     vice-versa? I suspect inter-process communication must be the
>     cause of
>     the poor performance when using many nodes, but not as much as I'm
>     seeing.
>
>     Thank you in advance!
>
>
> We can't say anything about the performance without some data. Please 
> send us the output
> of -log_view for both cases.
>
> ? Thanks,
>
> ? ? ?Matt
>
>     Luciano.
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/ae76822e/attachment-0001.html>
-------------- next part --------------
Running ./experiment -d 3 -n 31 -mat_type aij -ksp_type gmres -pc_type bjacobi -log_view

 Mesh Information:
  elem_dimensions()={3}
  spatial_dimension()=3
  n_nodes()=250047
    n_local_nodes()=8955
  n_elem()=29791
    n_local_elem()=935
    n_active_elem()=29791
  n_subdomains()=1
  n_partitions()=32
  n_processors()=32
  n_threads()=1
  processor_id()=0

*** Warning, This code is untested, experimental, or likely to see future API changes: ./include/libmesh/mesh_base.h, line 1667, compiled Jan 12 2021 at 12:34:39 ***
 EquationSystems
  n_systems()=1
   System #0, "Poisson"
    Type "LinearImplicit"
    Variables="u" 
    Finite Element Types="LAGRANGE" 
    Approximation Orders="SECOND" 
    n_dofs()=250047
    n_local_dofs()=8955
    n_constrained_dofs()=23066
    n_local_constrained_dofs()=636
    n_vectors()=1
    n_matrices()=1
    DofMap Sparsity
      Average  On-Processor Bandwidth <= 56.5003
      Average Off-Processor Bandwidth <= 7.21882
      Maximum  On-Processor Bandwidth <= 136
      Maximum Off-Processor Bandwidth <= 140
    DofMap Constraints
      Number of DoF Constraints = 23066
      Number of Heterogenous Constraints= 22818
      Average DoF Constraint Length= 0

 Mesh Information:
  elem_dimensions()={3}
  spatial_dimension()=3
  n_nodes()=250047
    n_local_nodes()=8955
  n_elem()=29791
    n_local_elem()=935
    n_active_elem()=29791
  n_subdomains()=1
  n_partitions()=32
  n_processors()=32
  n_threads()=1
  processor_id()=0


 -----------------------------------------------------
| Processor id:   0                                   |
| Num Processors: 32                                  |
| Time:           Wed Feb  3 17:26:38 2021            |
| OS:             Linux                               |
| HostName:       sdumont6197                         |
| OS Release:     3.10.0-957.el7.x86_64               |
| OS Version:     #1 SMP Thu Oct 4 20:48:51 UTC 2018  |
| Machine:        x86_64                              |
| Username:       luciano.siqueira                    |
| Configuration:  ../configure  '--prefix=/usr/local' |
|  '--with-vtk-include=/usr/local/include/vtk-8.2'    |
|  '--with-vtk-lib=/usr/local/lib'                    |
|  '--enable-petsc=yes'                               |
|  '--enable-petsc-required'                          |
|  '--enable-slepc'                                   |
|  '--enable-slepc-required'                          |
|  'METHODS=opt'                                      |
|  'PETSC_DIR=/opt/petsc'                             |
|  'PETSC_ARCH=arch-linux2-c-opt'                     |
|  'SLEPC_DIR=/opt/petsc/arch-linux2-c-opt'           |
 -----------------------------------------------------
 ------------------------------------------------------------------------------------------------------------
| Matrix Assembly Performance: Alive time=0.158664, Active time=0.068175                                     |
 ------------------------------------------------------------------------------------------------------------
| Event                         nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                          w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|------------------------------------------------------------------------------------------------------------|
|                                                                                                            |
| Fe                            935        0.0084      0.000009    0.0084      0.000009    12.35    12.35    |
| Ke                            935        0.0395      0.000042    0.0395      0.000042    57.88    57.88    |
| elem init                     935        0.0203      0.000022    0.0203      0.000022    29.76    29.76    |
 ------------------------------------------------------------------------------------------------------------
| Totals:                       2805       0.0682                                          100.00            |
 ------------------------------------------------------------------------------------------------------------

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./experiment on a arch-linux2-c-opt named sdumont6197 with 32 processors, by luciano.siqueira Wed Feb  3 17:26:39 2021
Using 1 OpenMP threads
Using Petsc Development GIT revision: v3.14.3-435-gd1574ab4cd  GIT Date: 2021-01-11 15:13:43 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           2.792e+00     1.000   2.791e+00
Objects:              6.600e+01     1.000   6.600e+01
Flop:                 5.609e+08     1.478   4.731e+08  1.514e+10
Flop/sec:             2.009e+08     1.478   1.695e+08  5.424e+09
MPI Messages:         3.178e+03     3.446   1.835e+03  5.872e+04
MPI Message Lengths:  1.138e+07     1.910   4.579e+03  2.689e+08
MPI Reductions:       4.340e+02     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 2.7915e+00 100.0%  1.5140e+10 100.0%  5.872e+04 100.0%  4.579e+03      100.0%  4.270e+02  98.4%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          7 1.0 9.5138e-02 6.8 0.00e+00 0.0 8.1e+02 5.3e+00 7.0e+00  1  0  1  0  2   1  0  1  0  2     0
BuildTwoSidedF         5 1.0 8.6365e-02162.3 0.00e+00 0.0 6.6e+02 3.6e+04 5.0e+00  1  0  1  9  1   1  0  1  9  1     0
MatMult              198 1.0 4.5126e-01 1.2 2.27e+08 1.5 5.4e+04 3.1e+03 1.0e+00 14 40 92 63  0  14 40 92 63  0 13438
MatSolve             199 1.0 3.5615e-01 1.5 2.00e+08 1.6 0.0e+00 0.0e+00 0.0e+00 11 36  0  0  0  11 36  0  0  0 15134
MatLUFactorNum         1 1.0 5.2785e-02 1.5 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0 10833
MatILUFactorSym        1 1.0 8.4187e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 9.9590e-02 6.6 0.00e+00 0.0 2.7e+02 8.8e+04 2.0e+00  1  0  0  9  0   1  0  0  9  0     0
MatAssemblyEnd         2 1.0 2.3421e-02 1.0 3.94e+04 0.0 0.0e+00 0.0e+00 4.0e+00  1  0  0  0  1   1  0  0  0  1    21
MatGetRowIJ            1 1.0 1.8900e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2824e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         3 1.0 3.9527e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              191 1.0 1.5713e-01 6.0 5.11e+07 1.4 0.0e+00 0.0e+00 1.9e+02  4  9  0  0 44   4  9  0  0 45  9089
VecNorm              199 1.0 2.8923e-02 3.6 3.56e+06 1.4 0.0e+00 0.0e+00 2.0e+02  1  1  0  0 46   1  1  0  0 47  3441
VecScale             198 1.0 9.2115e-04 1.3 1.77e+06 1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 53747
VecCopy                8 1.0 1.1197e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               211 1.0 2.1250e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               14 1.0 1.2501e-0312.0 2.51e+05 1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5601
VecMAXPY             198 1.0 3.5915e-02 1.4 5.46e+07 1.4 0.0e+00 0.0e+00 0.0e+00  1 10  0  0  0   1 10  0  0  0 42427
VecAssemblyBegin       3 1.0 6.9581e-04 1.1 0.00e+00 0.0 4.0e+02 9.8e+02 3.0e+00  0  0  1  0  1   0  0  1  0  1     0
VecAssemblyEnd         3 1.0 9.0871e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      199 1.0 3.8270e-02 2.6 0.00e+00 0.0 5.5e+04 3.1e+03 2.0e+00  1  0 93 64  0   1  0 93 64  0     0
VecScatterEnd        199 1.0 1.0005e-0112.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecNormalize         198 1.0 2.8637e-02 3.4 5.32e+06 1.4 0.0e+00 0.0e+00 2.0e+02  1  1  0  0 46   1  1  0  0 46  5187
SFSetGraph             2 1.0 4.9453e-05 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                2 1.0 2.3974e-02 5.2 0.00e+00 0.0 1.1e+03 9.4e+02 2.0e+00  1  0  2  0  0   1  0  2  0  0     0
SFPack               199 1.0 7.6794e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack             199 1.0 1.1253e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 8.8754e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 8.9594e-01 1.0 5.38e+08 1.5 5.4e+04 3.1e+03 3.9e+02 32 96 92 63 90  32 96 92 63 92 16253
KSPGMRESOrthog       191 1.0 1.8209e-01 3.0 1.02e+08 1.4 0.0e+00 0.0e+00 1.9e+02  5 19  0  0 44   5 19  0  0 45 15687
PCSetUp                2 1.0 8.7059e-02 1.3 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  6568
PCSetUpOnBlocks        1 1.0 6.1107e-02 1.5 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0  9358
PCApply              199 1.0 3.5986e-01 1.5 2.00e+08 1.6 0.0e+00 0.0e+00 0.0e+00 11 36  0  0  0  11 36  0  0  0 14978
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

    Distributed Mesh     1              1         5048     0.
              Matrix     4              4     14023836     0.
           Index Set     7              7       152940     0.
   IS L to G Mapping     1              1        53380     0.
              Vector    43             43      2874792     0.
   Star Forest Graph     4              4         4576     0.
       Krylov Solver     2              2        20184     0.
      Preconditioner     2              2         1944     0.
     Discrete System     1              1          960     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 1.507e-07
Average time for MPI_Barrier(): 9.6098e-06
Average time for zero size MPI_Send(): 4.25837e-06
#PETSc Option Table entries:
-d 3
-ksp_type gmres
-log_view
-mat_type aij
-n 31
-pc_type bjacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --with-openmp=1 --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-spai --download-parms --download-slepc --download-openmpi=yes COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS=
-----------------------------------------
Libraries compiled on 2021-01-12 11:28:56 on libmesh-cpu 
Machine characteristics: Linux-5.4.0-60-generic-x86_64-with-debian-10.7
Using PETSc directory: /opt/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /opt/petsc/arch-linux2-c-opt/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fopenmp   
Using Fortran compiler: /opt/petsc/arch-linux2-c-opt/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fopenmp    
-----------------------------------------

Using include paths: -I/opt/petsc/include -I/opt/petsc/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /opt/petsc/arch-linux2-c-opt/bin/mpicc
Using Fortran linker: /opt/petsc/arch-linux2-c-opt/bin/mpif90
Using libraries: -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lparms -lspai -llapack -lblas -lX11 -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
Running ./experiment -d 3 -n 31 -mat_type aij -ksp_type gmres -pc_type bjacobi -log_view

 Mesh Information:
  elem_dimensions()={3}
  spatial_dimension()=3
  n_nodes()=250047
    n_local_nodes()=569
  n_elem()=29791
    n_local_elem()=47
    n_active_elem()=29791
  n_subdomains()=1
  n_partitions()=640
  n_processors()=640
  n_threads()=1
  processor_id()=0

*** Warning, This code is untested, experimental, or likely to see future API changes: ./include/libmesh/mesh_base.h, line 1667, compiled Jan 12 2021 at 12:34:39 ***
 EquationSystems
  n_systems()=1
   System #0, "Poisson"
    Type "LinearImplicit"
    Variables="u" 
    Finite Element Types="LAGRANGE" 
    Approximation Orders="SECOND" 
    n_dofs()=250047
    n_local_dofs()=569
    n_constrained_dofs()=23066
    n_local_constrained_dofs()=149
    n_vectors()=1
    n_matrices()=1
    DofMap Sparsity
      Average  On-Processor Bandwidth <= 44.9841
      Average Off-Processor Bandwidth <= 23.7024
      Maximum  On-Processor Bandwidth <= 145
      Maximum Off-Processor Bandwidth <= 158
    DofMap Constraints
      Number of DoF Constraints = 23066
      Number of Heterogenous Constraints= 22818
      Average DoF Constraint Length= 0

 Mesh Information:
  elem_dimensions()={3}
  spatial_dimension()=3
  n_nodes()=250047
    n_local_nodes()=569
  n_elem()=29791
    n_local_elem()=47
    n_active_elem()=29791
  n_subdomains()=1
  n_partitions()=640
  n_processors()=640
  n_threads()=1
  processor_id()=0


 -----------------------------------------------------
| Processor id:   0                                   |
| Num Processors: 640                                 |
| Time:           Mon Feb  1 18:53:04 2021            |
| OS:             Linux                               |
| HostName:       sdumont6170                         |
| OS Release:     3.10.0-957.el7.x86_64               |
| OS Version:     #1 SMP Thu Oct 4 20:48:51 UTC 2018  |
| Machine:        x86_64                              |
| Username:       luciano.siqueira                    |
| Configuration:  ../configure  '--prefix=/usr/local' |
|  '--with-vtk-include=/usr/local/include/vtk-8.2'    |
|  '--with-vtk-lib=/usr/local/lib'                    |
|  '--enable-petsc=yes'                               |
|  '--enable-petsc-required'                          |
|  '--enable-slepc'                                   |
|  '--enable-slepc-required'                          |
|  'METHODS=opt'                                      |
|  'PETSC_DIR=/opt/petsc'                             |
|  'PETSC_ARCH=arch-linux2-c-opt'                     |
|  'SLEPC_DIR=/opt/petsc/arch-linux2-c-opt'           |
 -----------------------------------------------------
 ------------------------------------------------------------------------------------------------------------
| Matrix Assembly Performance: Alive time=0.056831, Active time=0.006895                                     |
 ------------------------------------------------------------------------------------------------------------
| Event                         nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                          w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|------------------------------------------------------------------------------------------------------------|
|                                                                                                            |
| Fe                            47         0.0004      0.000009    0.0004      0.000009    6.42     6.42     |
| Ke                            47         0.0020      0.000042    0.0020      0.000042    28.83    28.83    |
| elem init                     47         0.0045      0.000095    0.0045      0.000095    64.74    64.74    |
 ------------------------------------------------------------------------------------------------------------
| Totals:                       141        0.0069                                          100.00            |
 ------------------------------------------------------------------------------------------------------------

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./experiment on a arch-linux2-c-opt named sdumont6170 with 640 processors, by luciano.siqueira Mon Feb  1 18:53:07 2021
Using 1 OpenMP threads
Using Petsc Development GIT revision: v3.14.3-435-gd1574ab4cd  GIT Date: 2021-01-11 15:13:43 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           1.968e+02     1.000   1.968e+02
Objects:              6.600e+01     1.000   6.600e+01
Flop:                 4.131e+07     4.553   2.385e+07  1.526e+10
Flop/sec:             2.099e+05     4.553   1.212e+05  7.756e+07
MPI Messages:         8.425e+03     2.949   5.414e+03  3.465e+06
MPI Message Lengths:  5.026e+06     1.669   7.080e+02  2.453e+09
MPI Reductions:       4.890e+02     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 1.9678e+02 100.0%  1.5262e+10 100.0%  3.465e+06 100.0%  7.080e+02      100.0%  4.820e+02  98.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          7 1.0 2.8366e-01 5.9 0.00e+00 0.0 2.7e+04 5.2e+00 7.0e+00  0  0  1  0  1   0  0  1  0  1     0
BuildTwoSidedF         5 1.0 2.5666e-01 9.5 0.00e+00 0.0 2.0e+04 4.0e+03 5.0e+00  0  0  1  3  1   0  0  1  3  1     0
MatMult              226 1.0 6.4752e-01 3.1 1.87e+07 4.2 2.2e+06 3.9e+02 1.0e+00  0 45 63 35  0   0 45 63 35  0 10689
MatSolve             227 1.0 2.6350e-02 6.7 1.28e+07 7.0 0.0e+00 0.0e+00 0.0e+00  0 29  0  0  0   0 29  0  0  0 168471
MatLUFactorNum         1 1.0 3.0115e-0310.8 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 106202
MatILUFactorSym        1 1.0 1.7141e-0219.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 2.3641e-0120.6 0.00e+00 0.0 8.1e+03 9.7e+03 2.0e+00  0  0  0  3  0   0  0  0  3  0     0
MatAssemblyEnd         2 1.0 4.7616e-01 1.8 8.30e+03 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  1   0  0  0  0  1     4
MatGetRowIJ            1 1.0 2.4430e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 4.3698e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         3 1.0 2.4557e-04 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              218 1.0 1.2456e+00 1.7 3.98e+06 3.2 0.0e+00 0.0e+00 2.2e+02  0 11  0  0 45   0 11  0  0 45  1320
VecNorm              227 1.0 1.3911e+00 1.6 2.75e+05 3.2 0.0e+00 0.0e+00 2.3e+02  1  1  0  0 46   1  1  0  0 47    82
VecScale             226 1.0 7.1863e-04 2.2 1.37e+05 3.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 78637
VecCopy                9 1.0 2.7855e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               240 1.0 3.9133e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               16 1.0 6.4706e-03293.5 1.94e+04 3.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1237
VecMAXPY             226 1.0 3.9906e-03 2.6 4.25e+06 3.2 0.0e+00 0.0e+00 0.0e+00  0 11  0  0  0   0 11  0  0  0 439741
VecAssemblyBegin       3 1.0 2.2735e-02 1.4 0.00e+00 0.0 1.2e+04 1.2e+02 3.0e+00  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         3 1.0 2.9396e-03277.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      227 1.0 6.0738e-02 1.8 0.00e+00 0.0 2.2e+06 3.9e+02 2.0e+00  0  0 64 35  0   0  0 64 35  0     0
VecScatterEnd        227 1.0 5.8930e-01 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         226 1.0 1.3851e+00 1.6 4.10e+05 3.2 0.0e+00 0.0e+00 2.3e+02  1  1  0  0 46   1  1  0  0 47   122
SFSetGraph             2 1.0 2.1940e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                2 1.0 4.2313e-02 1.7 0.00e+00 0.0 3.8e+04 1.1e+02 2.0e+00  0  0  1  0  0   0  0  1  0  0     0
SFPack               227 1.0 1.7886e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack             227 1.0 2.3074e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 6.7246e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.5118e+00 1.0 4.02e+07 4.5 2.2e+06 3.9e+02 4.5e+02  1 98 63 35 91   1 98 63 35 93  5947
KSPGMRESOrthog       218 1.0 1.2489e+00 1.7 7.96e+06 3.2 0.0e+00 0.0e+00 2.2e+02  0 22  0  0 45   0 22  0  0 45  2634
PCSetUp                2 1.0 4.3814e-02 1.6 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  7300
PCSetUpOnBlocks        1 1.0 1.8862e-02 8.3 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 16956
PCApply              227 1.0 2.9083e-02 5.1 1.28e+07 7.0 0.0e+00 0.0e+00 0.0e+00  0 29  0  0  0   0 29  0  0  0 152639
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

    Distributed Mesh     1              1         5048     0.
              Matrix     4              4       842592     0.
           Index Set     7              7        18132     0.
   IS L to G Mapping     1              1         4972     0.
              Vector    43             43       257096     0.
   Star Forest Graph     4              4         4576     0.
       Krylov Solver     2              2        20184     0.
      Preconditioner     2              2         1944     0.
     Discrete System     1              1          960     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 3.933e-07
Average time for MPI_Barrier(): 0.00498015
Average time for zero size MPI_Send(): 0.000194207
#PETSc Option Table entries:
-d 3
-ksp_type gmres
-log_view
-mat_type aij
-n 31
-pc_type bjacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --with-openmp=1 --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-spai --download-parms --download-slepc --download-openmpi=yes COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS=
-----------------------------------------
Libraries compiled on 2021-01-12 11:28:56 on libmesh-cpu 
Machine characteristics: Linux-5.4.0-60-generic-x86_64-with-debian-10.7
Using PETSc directory: /opt/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /opt/petsc/arch-linux2-c-opt/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fopenmp   
Using Fortran compiler: /opt/petsc/arch-linux2-c-opt/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fopenmp    
-----------------------------------------

Using include paths: -I/opt/petsc/include -I/opt/petsc/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /opt/petsc/arch-linux2-c-opt/bin/mpicc
Using Fortran linker: /opt/petsc/arch-linux2-c-opt/bin/mpif90
Using libraries: -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lparms -lspai -llapack -lblas -lX11 -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
-----------------------------------------

From bsmith at petsc.dev  Wed Feb  3 22:37:21 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 3 Feb 2021 22:37:21 -0600
Subject: [petsc-users] Slower performance in multi-node system
In-Reply-To: <ffbc9b4b-9475-4a78-b011-381c355572c1@usp.br>
References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br>
	<CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com>
	<ffbc9b4b-9475-4a78-b011-381c355572c1@usp.br>
Message-ID: <E3485D5D-BF25-438D-9D55-59764C70CA70@petsc.dev>


  https://www.mcs.anl.gov/petsc/documentation/faq.html#computers <https://www.mcs.anl.gov/petsc/documentation/faq.html#computers>

  In particular looking at the results of the parallel run I see 

Average time to get PetscTime(): 3.933e-07
Average time for MPI_Barrier(): 0.00498015
Average time for zero size MPI_Send(): 0.000194207

  So the times for communication are huge. 4.9 milliseconds for a synchronization of twenty processes. A millisecond is an eternity for parallel computing. It is not clear to me that this system is appropriate for tightly couple parallel simulations.

  Barry


> On Feb 3, 2021, at 2:40 PM, Luciano Siqueira <luciano.siqueira at usp.br> wrote:
> 
> Here are the (attached) output of -log_view for both cases. The beginning of the files has some info from the libmesh app.
> 
> Running in 1 node, 32 cores: 01_node_log_view.txt
> 
> Running in 20 nodes, 32 cores each (640 cores in total): 01_node_log_view.txt
> 
> Thanks!
> 
> Luciano.
> 
> Em 03/02/2021 16:43, Matthew Knepley escreveu:
>> On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira <luciano.siqueira at usp.br <mailto:luciano.siqueira at usp.br>> wrote:
>> Hello,
>> 
>> I'm evaluating the performance of an application in a distributed 
>> environment and I notice that it's much slower when running in many 
>> nodes/cores when compared to a single node with a fewer cores.
>> 
>> When running the application in 20 nodes, the Main Stage time reported 
>> in PETSc's log is up to 10 times slower than it is when running the same 
>> application in only 1 node, even with fewer cores per node.
>> 
>> The application I'm running is an example code provided by libmesh:
>> 
>> http://libmesh.github.io/examples/introduction_ex4.html <http://libmesh.github.io/examples/introduction_ex4.html>
>> 
>> The application runs inside a Singularity container, with openmpi-4.0.3 
>> and PETSc 3.14.3. The distributed processes are managed by slurm 
>> 17.02.11 and each node is equipped with two Intel CPU Xeon E5-2695v2 Ivy 
>> Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going through 
>> infiniband.
>> 
>> My questions are: Is the slowdown expected? Should the application be 
>> specially tailored to work well in distributed environments?
>> 
>> Also, where (maybe in PETSc documentation/source-code) can I find 
>> information on how PETSc handles MPI communications? Do the KSP solvers 
>> favor one-to-one process communication over broadcast messages or 
>> vice-versa? I suspect inter-process communication must be the cause of 
>> the poor performance when using many nodes, but not as much as I'm seeing.
>> 
>> Thank you in advance!
>> 
>> We can't say anything about the performance without some data. Please send us the output
>> of -log_view for both cases.
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>> Luciano.
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> <01_node_log_view.txt><20_node_log_view.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/fe69a8a4/attachment.html>

From eijkhout at tacc.utexas.edu  Thu Feb  4 10:07:15 2021
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Thu, 4 Feb 2021 16:07:15 +0000
Subject: [petsc-users] Slower performance in multi-node system
In-Reply-To: <E3485D5D-BF25-438D-9D55-59764C70CA70@petsc.dev>
References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br>
	<CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com>
	<ffbc9b4b-9475-4a78-b011-381c355572c1@usp.br>
	<E3485D5D-BF25-438D-9D55-59764C70CA70@petsc.dev>
Message-ID: <C69D1BCF-145F-4B3C-9573-64C0416A2B8B@tacc.utexas.edu>


On , 2021Feb3, at 22:37, Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>> wrote:


  https://www.mcs.anl.gov/petsc/documentation/faq.html#computers

I happened to scroll up a line, and

Any useful books on numerical computing?
Writing Scientific Software: A Guide to Good Style<https://www.mcs.anl.gov/core/books/writing-scientific-software/23206704175AF868E43FE3FB399C2F53>

Is a dead link.

Feel free to link to my 3 textbooks:

https://pages.tacc.utexas.edu/~eijkhout/istc/istc.html

Victor.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210204/d9012e34/attachment.html>

From eijkhout at tacc.utexas.edu  Thu Feb  4 11:05:15 2021
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Thu, 4 Feb 2021 17:05:15 +0000
Subject: [petsc-users] Slower performance in multi-node system
In-Reply-To: <E3485D5D-BF25-438D-9D55-59764C70CA70@petsc.dev>
References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br>
	<CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com>
	<ffbc9b4b-9475-4a78-b011-381c355572c1@usp.br>
	<E3485D5D-BF25-438D-9D55-59764C70CA70@petsc.dev>
Message-ID: <DD11C497-A64E-4D10-8C41-A26AE819158E@tacc.utexas.edu>


On , 2021Feb3, at 22:37, Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>> wrote:


  https://www.mcs.anl.gov/petsc/documentation/faq.html#computers

./process.py createfile ; process.py

That script doesn?t work for python3.

Also: second time without dot-slash?

Victor.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210204/19531bed/attachment-0001.html>

From patrick.sanan at gmail.com  Thu Feb  4 12:27:16 2021
From: patrick.sanan at gmail.com (Patrick Sanan)
Date: Thu, 4 Feb 2021 19:27:16 +0100
Subject: [petsc-users] Slower performance in multi-node system
In-Reply-To: <C69D1BCF-145F-4B3C-9573-64C0416A2B8B@tacc.utexas.edu>
References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br>
	<CAMYG4GkuQd-DCU25Q=uc2kFJpGzYvEZxUhj6=y9i6ChN6tLpfw@mail.gmail.com>
	<ffbc9b4b-9475-4a78-b011-381c355572c1@usp.br>
	<E3485D5D-BF25-438D-9D55-59764C70CA70@petsc.dev>
	<C69D1BCF-145F-4B3C-9573-64C0416A2B8B@tacc.utexas.edu>
Message-ID: <73C62AB6-EAA0-4F0D-91B4-1201175B6680@gmail.com>

That page has been ported to Sphinx in master (thanks, Jacob!), so if adding that link, it'd be helpful to do it here (Note you can click on the "edit on Gitlab" in the tab in the bottom right and make an MR, which is handy for little changes which you expect to get right in one attempt)
https://docs.petsc.org/en/master/faq/#any-useful-books-on-numerical-computing <https://docs.petsc.org/en/master/faq/#any-useful-books-on-numerical-computing>

(And Sphinx has a utility to check all external links, so we should be able to clean up this and any other dead links in one pass before the next release)

> Am 04.02.2021 um 17:07 schrieb Victor Eijkhout <eijkhout at tacc.utexas.edu>:
> 
> 
> 
>> On , 2021Feb3, at 22:37, Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>> 
>>   https://www.mcs.anl.gov/petsc/documentation/faq.html#computers <https://www.mcs.anl.gov/petsc/documentation/faq.html#computers>
> I happened to scroll up a line, and
> 
> Any useful books on numerical computing? <>Writing Scientific Software: A Guide to Good Style <https://www.mcs.anl.gov/core/books/writing-scientific-software/23206704175AF868E43FE3FB399C2F53>
> 
> Is a dead link. 
> 
> Feel free to link to my 3 textbooks:
> 
> https://pages.tacc.utexas.edu/~eijkhout/istc/istc.html <https://pages.tacc.utexas.edu/~eijkhout/istc/istc.html>
> 
> Victor.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210204/e00dc16c/attachment.html>

From nicolas.barral at math.u-bordeaux.fr  Mon Feb  8 05:00:53 2021
From: nicolas.barral at math.u-bordeaux.fr (Nicolas Barral)
Date: Mon, 8 Feb 2021 12:00:53 +0100
Subject: [petsc-users] DMPlex tetrahedra facets orientation
Message-ID: <dc2cb858-9e8e-8ace-32c6-0800c10b1115@math.u-bordeaux.fr>

Hi all,

Can I make any assumption on the orientation of triangular facets in a 
tetrahedral plex ? I need the inward facet normals. Do I need to use 
DMPlexGetOrientedFace or can I rely on either the tet vertices ordering, 
or the faces ordering ? Could DMPlexGetRawFaces_Internal be enough ?

Alternatively, is there a function that computes the normals - without 
bringing out the big guns ?

Thanks

-- 
Nicolas

From e0425375 at gmail.com  Mon Feb  8 06:03:59 2021
From: e0425375 at gmail.com (Florian Bruckner)
Date: Mon, 8 Feb 2021 13:03:59 +0100
Subject: [petsc-users] using preconditioner with SLEPc
Message-ID: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>

Dear PETSc / SLEPc Users,

my question is very similar to the one posted here:
https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html

The eigensystem I would like to solve looks like:
B0 v = 1/omega A0 v
B0 and A0 are both hermitian, A0 is positive definite, but only given as a
linear operator (matshell). I am looking for the largest eigenvalues
(=smallest omega).

I also have a sparse approximation P0 of the A0 operator, which i would
like to use as precondtioner, using something like this:

        es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
        st = es.getST()
        ksp = st.getKSP()
        ksp.setOperators(self.A0, self.P0)

Unfortunately PETSc still complains that it cannot create a preconditioner
for a type 'python' matrix although P0.type == 'seqaij' (but A0.type ==
'python').
By the way, should P0 be an approximation of A0 or does it have to include
B0?

Right now I am using the krylov-schur method. Are there any alternatives if
A0 is only given as an operator?

thanks for any advice
best wishes
Florian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210208/96104589/attachment.html>

From knepley at gmail.com  Mon Feb  8 07:22:42 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 8 Feb 2021 08:22:42 -0500
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
Message-ID: <CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>

On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com> wrote:

> Dear PETSc / SLEPc Users,
>
> my question is very similar to the one posted here:
> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
>
> The eigensystem I would like to solve looks like:
> B0 v = 1/omega A0 v
> B0 and A0 are both hermitian, A0 is positive definite, but only given as a
> linear operator (matshell). I am looking for the largest eigenvalues
> (=smallest omega).
>
> I also have a sparse approximation P0 of the A0 operator, which i would
> like to use as precondtioner, using something like this:
>
>         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>         st = es.getST()
>         ksp = st.getKSP()
>         ksp.setOperators(self.A0, self.P0)
>
> Unfortunately PETSc still complains that it cannot create a preconditioner
> for a type 'python' matrix although P0.type == 'seqaij' (but A0.type ==
> 'python').
> By the way, should P0 be an approximation of A0 or does it have to include
> B0?
>
> Right now I am using the krylov-schur method. Are there any alternatives
> if A0 is only given as an operator?
>

Jose can correct me if I say something wrong.

When I did this, I made a shell operator for the action of A0^{-1} B0 which
has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
then handed that to EPS. You can see me do it here:


https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123

I had a hard time getting the embedded solver to work the way I wanted, but
maybe that is the better way.

  Thanks,

     Matt


> thanks for any advice
> best wishes
> Florian
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210208/e18f59cf/attachment.html>

From knepley at gmail.com  Mon Feb  8 08:19:06 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 8 Feb 2021 09:19:06 -0500
Subject: [petsc-users] DMPlex tetrahedra facets orientation
In-Reply-To: <dc2cb858-9e8e-8ace-32c6-0800c10b1115@math.u-bordeaux.fr>
References: <dc2cb858-9e8e-8ace-32c6-0800c10b1115@math.u-bordeaux.fr>
Message-ID: <CAMYG4Gms+joCEdzgOU7r_ye9MywEg-EjgRdgcAxwYYRDMHRG3Q@mail.gmail.com>

On Mon, Feb 8, 2021 at 6:01 AM Nicolas Barral <
nicolas.barral at math.u-bordeaux.fr> wrote:

> Hi all,
>
> Can I make any assumption on the orientation of triangular facets in a
> tetrahedral plex ? I need the inward facet normals. Do I need to use
> DMPlexGetOrientedFace or can I rely on either the tet vertices ordering,
> or the faces ordering ? Could DMPlexGetRawFaces_Internal be enough ?
>

You can do it by hand, but you have to account for the face orientation
relative to the cell. That is what
DMPlexGetOrientedFace() does. I think it would be easier to use the
function below.


> Alternatively, is there a function that computes the normals - without
> bringing out the big guns ?
>

This will compute the normals

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexComputeCellGeometryFVM.html

Should not be too heavy weight.

  THanks,

    Matt

Thanks
>
> --
> Nicolas
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210208/4723f1e2/attachment.html>

From jroman at dsic.upv.es  Mon Feb  8 08:37:34 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Mon, 8 Feb 2021 15:37:34 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
Message-ID: <DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>

The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is

  (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is

  A0^{-1}*B0*v=theta*v

and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.

Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.

Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.

The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.

As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.

If you are using the git repo, I could add the relevant code.

Jose


> El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com> escribi?:
> 
> On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com> wrote:
> Dear PETSc / SLEPc Users,
> 
> my question is very similar to the one posted here: 
> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
> 
> The eigensystem I would like to solve looks like:
> B0 v = 1/omega A0 v
> B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
> 
> I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
> 
>         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>         st = es.getST()
>         ksp = st.getKSP()
>         ksp.setOperators(self.A0, self.P0)
> 
> Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
> By the way, should P0 be an approximation of A0 or does it have to include B0?
> 
> Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
> 
> Jose can correct me if I say something wrong.
> 
> When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
> then handed that to EPS. You can see me do it here:
> 
>   https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
> 
> I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
> 
>   Thanks,
> 
>      Matt
>  
> thanks for any advice
> best wishes
> Florian
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/


From knepley at gmail.com  Mon Feb  8 08:48:52 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 8 Feb 2021 09:48:52 -0500
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
Message-ID: <CAMYG4GnTUa8B1Mz7=UtkRiKtczNjjxvdS5q+6exDnb3dRAtYWw@mail.gmail.com>

On Mon, Feb 8, 2021 at 9:37 AM Jose E. Roman <jroman at dsic.upv.es> wrote:

> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues
> omega closest to zero. If the matrices were explicitly available, you would
> do shift-and-invert with target=0, that is
>
>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>
>   A0^{-1}*B0*v=theta*v
>
> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>
> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
> EPS_SMALLEST_REAL will give slow convergence.
>

Thanks Jose! I am not understanding some step. I want the smallest
eigenvalues. Should I use EPS_SMALLEST_MAGNITUDE? I appear to get what I
want
using SMALLEST_REAL, but as you say it might be slower than it has to be.

Also, sometime I would like to talk about incorporating the multilevel
eigensolver. I am sure you could make lots of improvements to my initial
attempt. I will send
you a separate email, since I am getting serious about testing it.

  Thanks,

     Matt


> Florian: I would not recommend setting the KSP matrices directly, it may
> produce strange side-effects. We should have an interface function to pass
> this matrix. Currently there is STPrecondSetMatForPC() but it has two
> problems: (1) it is intended for STPRECOND, so cannot be used with
> Krylov-Schur, and (2) it is not currently available in the python interface.
>
> The approach used by Matt is a workaround that does not use ST, so you can
> handle linear solves with a KSP of your own.
>
> As an alternative, since your problem is symmetric, you could try LOBPCG,
> assuming that the leftmost eigenvalues are those that you want (e.g. if all
> eigenvalues are non-negative). In that case you could use
> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>
> If you are using the git repo, I could add the relevant code.
>
> Jose
>
>
>
> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
> escribi?:
> >
> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com>
> wrote:
> > Dear PETSc / SLEPc Users,
> >
> > my question is very similar to the one posted here:
> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
> >
> > The eigensystem I would like to solve looks like:
> > B0 v = 1/omega A0 v
> > B0 and A0 are both hermitian, A0 is positive definite, but only given as
> a linear operator (matshell). I am looking for the largest eigenvalues
> (=smallest omega).
> >
> > I also have a sparse approximation P0 of the A0 operator, which i would
> like to use as precondtioner, using something like this:
> >
> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >         st = es.getST()
> >         ksp = st.getKSP()
> >         ksp.setOperators(self.A0, self.P0)
> >
> > Unfortunately PETSc still complains that it cannot create a
> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
> A0.type == 'python').
> > By the way, should P0 be an approximation of A0 or does it have to
> include B0?
> >
> > Right now I am using the krylov-schur method. Are there any alternatives
> if A0 is only given as an operator?
> >
> > Jose can correct me if I say something wrong.
> >
> > When I did this, I made a shell operator for the action of A0^{-1} B0
> which has a KSPSolve() in it, so you can use your P0 preconditioning
> matrix, and
> > then handed that to EPS. You can see me do it here:
> >
> >
> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
> >
> > I had a hard time getting the embedded solver to work the way I wanted,
> but maybe that is the better way.
> >
> >   Thanks,
> >
> >      Matt
> >
> > thanks for any advice
> > best wishes
> > Florian
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210208/a1086cdd/attachment.html>

From dave.mayhem23 at gmail.com  Mon Feb  8 10:40:24 2021
From: dave.mayhem23 at gmail.com (Dave May)
Date: Mon, 8 Feb 2021 17:40:24 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAMYG4GnTUa8B1Mz7=UtkRiKtczNjjxvdS5q+6exDnb3dRAtYWw@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAMYG4GnTUa8B1Mz7=UtkRiKtczNjjxvdS5q+6exDnb3dRAtYWw@mail.gmail.com>
Message-ID: <CAJ98EDrS7Nzf-fbWiQQmD3uBVYeM-XExm6tEO-oTcCVhV9SHpw@mail.gmail.com>

On Mon 8. Feb 2021 at 15:49, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Feb 8, 2021 at 9:37 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
>
>> The problem can be written as A0*v=omega*B0*v and you want the
>> eigenvalues omega closest to zero. If the matrices were explicitly
>> available, you would do shift-and-invert with target=0, that is
>>
>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>
>>   A0^{-1}*B0*v=theta*v
>>
>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>
>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
>> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
>> EPS_SMALLEST_REAL will give slow convergence.
>>
>
> Thanks Jose! I am not understanding some step. I want the smallest
> eigenvalues. Should I use EPS_SMALLEST_MAGNITUDE? I appear to get what I
> want
> using SMALLEST_REAL, but as you say it might be slower than it has to be.
>


With shift-and-invert you want to use EPS_LARGEST_MAGNITUDE as Jose says.
The largest magnitude v eigenvalues you obtain (see Jose equation above)
from the transformed system correspond to the smallest magnitude omega
eigenvalues of the original problem.

Cheers
Dave


> Also, sometime I would like to talk about incorporating the multilevel
> eigensolver. I am sure you could make lots of improvements to my initial
> attempt. I will send
> you a separate email, since I am getting serious about testing it.
>
>   Thanks,
>
>      Matt
>
>
>> Florian: I would not recommend setting the KSP matrices directly, it may
>> produce strange side-effects. We should have an interface function to pass
>> this matrix. Currently there is STPrecondSetMatForPC() but it has two
>> problems: (1) it is intended for STPRECOND, so cannot be used with
>> Krylov-Schur, and (2) it is not currently available in the python interface.
>>
>> The approach used by Matt is a workaround that does not use ST, so you
>> can handle linear solves with a KSP of your own.
>>
>> As an alternative, since your problem is symmetric, you could try LOBPCG,
>> assuming that the leftmost eigenvalues are those that you want (e.g. if all
>> eigenvalues are non-negative). In that case you could use
>> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>
>> If you are using the git repo, I could add the relevant code.
>>
>> Jose
>>
>>
>>
>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
>> escribi?:
>> >
>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com>
>> wrote:
>> > Dear PETSc / SLEPc Users,
>> >
>> > my question is very similar to the one posted here:
>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
>> >
>> > The eigensystem I would like to solve looks like:
>> > B0 v = 1/omega A0 v
>> > B0 and A0 are both hermitian, A0 is positive definite, but only given
>> as a linear operator (matshell). I am looking for the largest eigenvalues
>> (=smallest omega).
>> >
>> > I also have a sparse approximation P0 of the A0 operator, which i would
>> like to use as precondtioner, using something like this:
>> >
>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>> >         st = es.getST()
>> >         ksp = st.getKSP()
>> >         ksp.setOperators(self.A0, self.P0)
>> >
>> > Unfortunately PETSc still complains that it cannot create a
>> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
>> A0.type == 'python').
>> > By the way, should P0 be an approximation of A0 or does it have to
>> include B0?
>> >
>> > Right now I am using the krylov-schur method. Are there any
>> alternatives if A0 is only given as an operator?
>> >
>> > Jose can correct me if I say something wrong.
>> >
>> > When I did this, I made a shell operator for the action of A0^{-1} B0
>> which has a KSPSolve() in it, so you can use your P0 preconditioning
>> matrix, and
>> > then handed that to EPS. You can see me do it here:
>> >
>> >
>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
>> >
>> > I had a hard time getting the embedded solver to work the way I wanted,
>> but maybe that is the better way.
>> >
>> >   Thanks,
>> >
>> >      Matt
>> >
>> > thanks for any advice
>> > best wishes
>> > Florian
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> > https://www.cse.buffalo.edu/~knepley/
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210208/1d6dffa1/attachment.html>

From dave.mayhem23 at gmail.com  Mon Feb  8 10:41:42 2021
From: dave.mayhem23 at gmail.com (Dave May)
Date: Mon, 8 Feb 2021 17:41:42 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAJ98EDrS7Nzf-fbWiQQmD3uBVYeM-XExm6tEO-oTcCVhV9SHpw@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAMYG4GnTUa8B1Mz7=UtkRiKtczNjjxvdS5q+6exDnb3dRAtYWw@mail.gmail.com>
	<CAJ98EDrS7Nzf-fbWiQQmD3uBVYeM-XExm6tEO-oTcCVhV9SHpw@mail.gmail.com>
Message-ID: <CAJ98EDoe4MgYKQ9d48fk=WRNCz1xAHF7BTOnScoUGkAXtnjt1Q@mail.gmail.com>

On Mon 8. Feb 2021 at 17:40, Dave May <dave.mayhem23 at gmail.com> wrote:

>
>
> On Mon 8. Feb 2021 at 15:49, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Mon, Feb 8, 2021 at 9:37 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
>>
>>> The problem can be written as A0*v=omega*B0*v and you want the
>>> eigenvalues omega closest to zero. If the matrices were explicitly
>>> available, you would do shift-and-invert with target=0, that is
>>>
>>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>>
>>>   A0^{-1}*B0*v=theta*v
>>>
>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>>
>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
>>> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
>>> EPS_SMALLEST_REAL will give slow convergence.
>>>
>>
>> Thanks Jose! I am not understanding some step. I want the smallest
>> eigenvalues. Should I use EPS_SMALLEST_MAGNITUDE? I appear to get what I
>> want
>> using SMALLEST_REAL, but as you say it might be slower than it has to be.
>>
>
>
> With shift-and-invert you want to use EPS_LARGEST_MAGNITUDE as Jose says.
> The largest magnitude v
>


Sorry ?v? should be ?theta?!

eigenvalues you obtain (see Jose equation above) from the transformed
> system correspond to the smallest magnitude omega eigenvalues of the
> original problem.
>
> Cheers
> Dave
>
>
>> Also, sometime I would like to talk about incorporating the multilevel
>> eigensolver. I am sure you could make lots of improvements to my initial
>> attempt. I will send
>> you a separate email, since I am getting serious about testing it.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Florian: I would not recommend setting the KSP matrices directly, it may
>>> produce strange side-effects. We should have an interface function to pass
>>> this matrix. Currently there is STPrecondSetMatForPC() but it has two
>>> problems: (1) it is intended for STPRECOND, so cannot be used with
>>> Krylov-Schur, and (2) it is not currently available in the python interface.
>>>
>>> The approach used by Matt is a workaround that does not use ST, so you
>>> can handle linear solves with a KSP of your own.
>>>
>>> As an alternative, since your problem is symmetric, you could try
>>> LOBPCG, assuming that the leftmost eigenvalues are those that you want
>>> (e.g. if all eigenvalues are non-negative). In that case you could use
>>> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>>
>>> If you are using the git repo, I could add the relevant code.
>>>
>>> Jose
>>>
>>>
>>>
>>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
>>> escribi?:
>>> >
>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com>
>>> wrote:
>>> > Dear PETSc / SLEPc Users,
>>> >
>>> > my question is very similar to the one posted here:
>>> >
>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
>>> >
>>> > The eigensystem I would like to solve looks like:
>>> > B0 v = 1/omega A0 v
>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given
>>> as a linear operator (matshell). I am looking for the largest eigenvalues
>>> (=smallest omega).
>>> >
>>> > I also have a sparse approximation P0 of the A0 operator, which i
>>> would like to use as precondtioner, using something like this:
>>> >
>>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>> >         st = es.getST()
>>> >         ksp = st.getKSP()
>>> >         ksp.setOperators(self.A0, self.P0)
>>> >
>>> > Unfortunately PETSc still complains that it cannot create a
>>> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
>>> A0.type == 'python').
>>> > By the way, should P0 be an approximation of A0 or does it have to
>>> include B0?
>>> >
>>> > Right now I am using the krylov-schur method. Are there any
>>> alternatives if A0 is only given as an operator?
>>> >
>>> > Jose can correct me if I say something wrong.
>>> >
>>> > When I did this, I made a shell operator for the action of A0^{-1} B0
>>> which has a KSPSolve() in it, so you can use your P0 preconditioning
>>> matrix, and
>>> > then handed that to EPS. You can see me do it here:
>>> >
>>> >
>>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
>>> >
>>> > I had a hard time getting the embedded solver to work the way I
>>> wanted, but maybe that is the better way.
>>> >
>>> >   Thanks,
>>> >
>>> >      Matt
>>> >
>>> > thanks for any advice
>>> > best wishes
>>> > Florian
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>> >
>>> > https://www.cse.buffalo.edu/~knepley/
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210208/a6fc70c2/attachment-0001.html>

From knepley at gmail.com  Mon Feb  8 11:44:59 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 8 Feb 2021 12:44:59 -0500
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAJ98EDrS7Nzf-fbWiQQmD3uBVYeM-XExm6tEO-oTcCVhV9SHpw@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAMYG4GnTUa8B1Mz7=UtkRiKtczNjjxvdS5q+6exDnb3dRAtYWw@mail.gmail.com>
	<CAJ98EDrS7Nzf-fbWiQQmD3uBVYeM-XExm6tEO-oTcCVhV9SHpw@mail.gmail.com>
Message-ID: <CAMYG4G=T6UoRbunmaYeo0OZ2+D-8QfaK+U-qQa8KNYoSm8utwg@mail.gmail.com>

On Mon, Feb 8, 2021 at 11:40 AM Dave May <dave.mayhem23 at gmail.com> wrote:

> On Mon 8. Feb 2021 at 15:49, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Mon, Feb 8, 2021 at 9:37 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
>>
>>> The problem can be written as A0*v=omega*B0*v and you want the
>>> eigenvalues omega closest to zero. If the matrices were explicitly
>>> available, you would do shift-and-invert with target=0, that is
>>>
>>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>>
>>>   A0^{-1}*B0*v=theta*v
>>>
>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>>
>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
>>> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
>>> EPS_SMALLEST_REAL will give slow convergence.
>>>
>>
>> Thanks Jose! I am not understanding some step. I want the smallest
>> eigenvalues. Should I use EPS_SMALLEST_MAGNITUDE? I appear to get what I
>> want
>> using SMALLEST_REAL, but as you say it might be slower than it has to be.
>>
>
>
> With shift-and-invert you want to use EPS_LARGEST_MAGNITUDE as Jose says.
> The largest magnitude v eigenvalues you obtain (see Jose equation above)
> from the transformed system correspond to the smallest magnitude omega
> eigenvalues of the original problem.
>

Okay. In my system for BAMG, however, I do not have 1/\omega, but just
\lambda, so I think it should be EPS_SMALLEST_MAGNITUDE now. I can check
that.

  Thanks,

     Matt


> Cheers
> Dave
>
>
>> Also, sometime I would like to talk about incorporating the multilevel
>> eigensolver. I am sure you could make lots of improvements to my initial
>> attempt. I will send
>> you a separate email, since I am getting serious about testing it.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Florian: I would not recommend setting the KSP matrices directly, it may
>>> produce strange side-effects. We should have an interface function to pass
>>> this matrix. Currently there is STPrecondSetMatForPC() but it has two
>>> problems: (1) it is intended for STPRECOND, so cannot be used with
>>> Krylov-Schur, and (2) it is not currently available in the python interface.
>>>
>>> The approach used by Matt is a workaround that does not use ST, so you
>>> can handle linear solves with a KSP of your own.
>>>
>>> As an alternative, since your problem is symmetric, you could try
>>> LOBPCG, assuming that the leftmost eigenvalues are those that you want
>>> (e.g. if all eigenvalues are non-negative). In that case you could use
>>> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>>
>>> If you are using the git repo, I could add the relevant code.
>>>
>>> Jose
>>>
>>>
>>>
>>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
>>> escribi?:
>>> >
>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com>
>>> wrote:
>>> > Dear PETSc / SLEPc Users,
>>> >
>>> > my question is very similar to the one posted here:
>>> >
>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
>>> >
>>> > The eigensystem I would like to solve looks like:
>>> > B0 v = 1/omega A0 v
>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given
>>> as a linear operator (matshell). I am looking for the largest eigenvalues
>>> (=smallest omega).
>>> >
>>> > I also have a sparse approximation P0 of the A0 operator, which i
>>> would like to use as precondtioner, using something like this:
>>> >
>>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>> >         st = es.getST()
>>> >         ksp = st.getKSP()
>>> >         ksp.setOperators(self.A0, self.P0)
>>> >
>>> > Unfortunately PETSc still complains that it cannot create a
>>> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
>>> A0.type == 'python').
>>> > By the way, should P0 be an approximation of A0 or does it have to
>>> include B0?
>>> >
>>> > Right now I am using the krylov-schur method. Are there any
>>> alternatives if A0 is only given as an operator?
>>> >
>>> > Jose can correct me if I say something wrong.
>>> >
>>> > When I did this, I made a shell operator for the action of A0^{-1} B0
>>> which has a KSPSolve() in it, so you can use your P0 preconditioning
>>> matrix, and
>>> > then handed that to EPS. You can see me do it here:
>>> >
>>> >
>>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
>>> >
>>> > I had a hard time getting the embedded solver to work the way I
>>> wanted, but maybe that is the better way.
>>> >
>>> >   Thanks,
>>> >
>>> >      Matt
>>> >
>>> > thanks for any advice
>>> > best wishes
>>> > Florian
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>> >
>>> > https://www.cse.buffalo.edu/~knepley/
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210208/20b142a2/attachment.html>

From y.juntao at hotmail.com  Mon Feb  8 22:50:02 2021
From: y.juntao at hotmail.com (Karl Yang)
Date: Tue, 9 Feb 2021 12:50:02 +0800
Subject: [petsc-users] Help needed on DMDA dm_boundary_none and
 matsetvaluesstencil
Message-ID: <SI2PR03MB5449C707C4A73839043DCCAA938E9@SI2PR03MB5449.apcprd03.prod.outlook.com>

Hi, all

I've encountered some issues with DM_BOUDNARY_NONE and MatSetValuesStencil. I had a code with DM_BOUNDARY_PERIODIC which was working fine. And I simply change the boundary, and find it not working any more. I wonder is there any difference in terms of indexing and stencil for DM_BOUNDARY_NONE.
The following is a simplified code to demonstrate what I was doing. It is basically assembling for finite elements.
But the MatSetValuesStencil seems not adding values into the matrix as I expected and some entries disappeared after MatSetValuesStencil for the second time. I've attached the output at the two different matview location. Some entries in the matrix disappeared after the second values add. And the order of matrix is wired to me, the matrix output is not in a ascending order. Appreciate if anyone would help.

/////////////////////////demo code ///////////////////////////////////
DM dm;
Mat A;
MatStencil s_u[4];

DMDACreate2d(PETSC_COMM_SELF, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DMDA_STENCIL_BOX, 5, 5, PETSC_DECIDE, PETSC_DECIDE, 3, 1, NULL, NULL, &dm);
DMSetMatType(dm, MATAIJ);
DMSetFromOptions(dm);
DMSetUp(dm);
DMDASetUniformCoordinates(dm, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0);

DMSetMatrixPreallocateOnly(dm, PETSC_TRUE);
DMCreateMatrix(dm, &A);

s_u[0].i = 0; s_u[0].j = 0; s_u[0].c = 1;
s_u[1].i = 0; s_u[1].j = 0+1; s_u[1].c = 1;
s_u[2].i = 0+1; s_u[2].j = 0+1; s_u[2].c = 1;
s_u[3].i = 0+1; s_u[3].j = 0; s_u[3].c = 1;

double Ke[16];
for (int n=0;n<16;++n){Ke[n]=1;};
MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES);

// MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
// MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
// MatView(A, PETSC_VIEWER_STDOUT_WORLD); //first matview

s_u[0].i = 1; s_u[0].j = 0; s_u[0].c = 1;
s_u[1].i = 1; s_u[1].j = 0+1; s_u[1].c = 1;
s_u[2].i = 1+1; s_u[2].j = 0+1; s_u[2].c = 1;
s_u[3].i = 1+1; s_u[3].j = 0; s_u[3].c = 1;

MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES);
MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
MatView(A, PETSC_VIEWER_STDOUT_WORLD); //second matview

////////////////first matview ////////////////////////
row 0:
row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
row 2:
row 3:
row 4: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
row 5:
row 6:
row 7:
row 8:
row 9:
row 10:
row 11:
row 12:
row 13:
row 14:
row 15:
row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
row 17:
row 18:
row 19: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
row 20:
row 21:
row 22:
row 23:
row 24:
row 25:
row 26:
row 27:
row 28:
row 29:
row 30:
row 31:
row 32:
row 33:
row 34:
row 35:
row 36:
row 37:
row 38:
row 39:
row 40:
row 41:
row 42:
row 43:
row 44:
row 45:
row 46:
row 47:
row 48:
row 49:
row 50:
row 51:
row 52:
row 53:
row 54:
row 55:
row 56:
row 57:
row 58:
row 59:
row 60:
row 61:
row 62:
row 63:
row 64:
row 65:
row 66:
row 67:
row 68:
row 69:
row 70:
row 71:
row 72:
row 73:
row 74:

///////////////second matview/////////////////////
row 0:
row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
row 2:
row 3:
row 4: (4, 1.) (19, 1.) (22, 1.) (7, 1.)
row 5:
row 6:
row 7: (4, 1.) (19, 1.) (22, 1.) (7, 1.)
row 8:
row 9:
row 10:
row 11:
row 12:
row 13:
row 14:
row 15:
row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
row 17:
row 18:
row 19: (4, 1.) (19, 1.) (22, 1.) (7, 1.)
row 20:
row 21:
row 22: (4, 1.) (19, 1.) (22, 1.) (7, 1.)
row 23:
row 24:
row 25:
row 26:
row 27:
row 28:
row 29:
row 30:
row 31:
row 32:
row 33:
row 34:
row 35:
row 36:
row 37:
row 38:
row 39:
row 40:
row 41:
row 42:
row 43:
row 44:
row 45:
row 46:
row 47:
row 48:
row 49:
row 50:
row 51:
row 52:
row 53:
row 54:
row 55:
row 56:
row 57:
row 58:
row 59:
row 60:
row 61:
row 62:
row 63:
row 64:
row 65:
row 66:
row 67:
row 68:
row 69:
row 70:
row 71:
row 72:
row 73:
row 74:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210209/13345f63/attachment.html>

From knepley at gmail.com  Tue Feb  9 07:41:13 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 9 Feb 2021 08:41:13 -0500
Subject: [petsc-users] Help needed on DMDA dm_boundary_none and
 matsetvaluesstencil
In-Reply-To: <SI2PR03MB5449C707C4A73839043DCCAA938E9@SI2PR03MB5449.apcprd03.prod.outlook.com>
References: <SI2PR03MB5449C707C4A73839043DCCAA938E9@SI2PR03MB5449.apcprd03.prod.outlook.com>
Message-ID: <CAMYG4GnOV_jozjsRF-t=afNCdX++ekQoWtuVy8dB+dfCg6EV5g@mail.gmail.com>

On Mon, Feb 8, 2021 at 11:50 PM Karl Yang <y.juntao at hotmail.com> wrote:

> Hi, all
>
> I've encountered some issues with DM_BOUDNARY_NONE and
> MatSetValuesStencil. I had a code with DM_BOUNDARY_PERIODIC which was
> working fine. And I simply change the boundary, and find it not working any
> more. I wonder is there any difference in terms of indexing and stencil for
> DM_BOUNDARY_NONE.
>

DM_BOUNDARY_PERIODIC puts another layer of ghost points around the local
boundary. Values in these are then transferred to the correct global
location when DMLocalToGlobal() is run. Also, DMGlobalToLocal() inserts
values form the correct global locations into the local vector.
DM_BOUNDARY_NONE
does not do any of that.

  Thanks,

     Matt


> The following is a simplified code to demonstrate what I was doing. It is
> basically assembling for finite elements.
> But the MatSetValuesStencil seems not adding values into the matrix as I
> expected and some entries disappeared after MatSetValuesStencil for the
> second time. I've attached the output at the two different matview
> location. Some entries in the matrix disappeared after the second values
> add. And the order of matrix is wired to me, the matrix output is not in a
> ascending order. Appreciate if anyone would help.
>
> /////////////////////////demo code ///////////////////////////////////
>
>             DM  dm;
>             Mat A;
>             MatStencil s_u[4];
>
>             DMDACreate2d(PETSC_COMM_SELF, DM_BOUNDARY_NONE,
> DM_BOUNDARY_NONE, DMDA_STENCIL_BOX, 5, 5, PETSC_DECIDE, PETSC_DECIDE, 3, 1,
> NULL, NULL, &dm);
>             DMSetMatType(dm, MATAIJ);
>             DMSetFromOptions(dm);
>             DMSetUp(dm);
>             DMDASetUniformCoordinates(dm, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0);
>
>             DMSetMatrixPreallocateOnly(dm, PETSC_TRUE);
>             DMCreateMatrix(dm, &A);
>
>             s_u[0].i = 0;   s_u[0].j = 0;   s_u[0].c = 1;
>             s_u[1].i = 0;   s_u[1].j = 0+1; s_u[1].c = 1;
>             s_u[2].i = 0+1; s_u[2].j = 0+1; s_u[2].c = 1;
>             s_u[3].i = 0+1; s_u[3].j = 0;   s_u[3].c = 1;
>
>             double Ke[16];
>             for (int n=0;n<16;++n){Ke[n]=1;};
>             MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES);
>
>             // MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>             // MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>             // MatView(A, PETSC_VIEWER_STDOUT_WORLD);  //first matview
>
>             s_u[0].i = 1;   s_u[0].j = 0;   s_u[0].c = 1;
>             s_u[1].i = 1;   s_u[1].j = 0+1; s_u[1].c = 1;
>             s_u[2].i = 1+1; s_u[2].j = 0+1; s_u[2].c = 1;
>             s_u[3].i = 1+1; s_u[3].j = 0;   s_u[3].c = 1;
>
>             MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES);
>
>             MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>             MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>             MatView(A, PETSC_VIEWER_STDOUT_WORLD);  //second matview
>
>
> ////////////////first matview ////////////////////////
> row 0:
> row 1: (1, 1.)  (16, 1.)  (19, 1.)  (4, 1.)
> row 2:
> row 3:
> row 4: *(1, 1.) ** (16, 1.) * (19, 1.)  (4, 1.)
> row 5:
> row 6:
> row 7:
> row 8:
> row 9:
> row 10:
> row 11:
> row 12:
> row 13:
> row 14:
> row 15:
> row 16: (1, 1.)  (16, 1.)  (19, 1.)  (4, 1.)
> row 17:
> row 18:
> row 19: *(1, 1.)  (16, 1.)*  (19, 1.)  (4, 1.)
> row 20:
> row 21:
> row 22:
> row 23:
> row 24:
> row 25:
> row 26:
> row 27:
> row 28:
> row 29:
> row 30:
> row 31:
> row 32:
> row 33:
> row 34:
> row 35:
> row 36:
> row 37:
> row 38:
> row 39:
> row 40:
> row 41:
> row 42:
> row 43:
> row 44:
> row 45:
> row 46:
> row 47:
> row 48:
> row 49:
> row 50:
> row 51:
> row 52:
> row 53:
> row 54:
> row 55:
> row 56:
> row 57:
> row 58:
> row 59:
> row 60:
> row 61:
> row 62:
> row 63:
> row 64:
> row 65:
> row 66:
> row 67:
> row 68:
> row 69:
> row 70:
> row 71:
> row 72:
> row 73:
> row 74:
>
> ///////////////second matview/////////////////////
> row 0:
> row 1: (1, 1.)  (16, 1.)  (19, 1.)  (4, 1.)
> row 2:
> row 3:
> row 4: (4, 1.)  (19, 1.)  (22, 1.)  (7, 1.)
> row 5:
> row 6:
> row 7: (4, 1.)  (19, 1.)  (22, 1.)  (7, 1.)
> row 8:
> row 9:
> row 10:
> row 11:
> row 12:
> row 13:
> row 14:
> row 15:
> row 16: (1, 1.)  (16, 1.)  (19, 1.)  (4, 1.)
> row 17:
> row 18:
> row 19: (4, 1.)  (19, 1.)  (22, 1.)  (7, 1.)
> row 20:
> row 21:
> row 22: (4, 1.)  (19, 1.)  (22, 1.)  (7, 1.)
> row 23:
> row 24:
> row 25:
> row 26:
> row 27:
> row 28:
> row 29:
> row 30:
> row 31:
> row 32:
> row 33:
> row 34:
> row 35:
> row 36:
> row 37:
> row 38:
> row 39:
> row 40:
> row 41:
> row 42:
> row 43:
> row 44:
> row 45:
> row 46:
> row 47:
> row 48:
> row 49:
> row 50:
> row 51:
> row 52:
> row 53:
> row 54:
> row 55:
> row 56:
> row 57:
> row 58:
> row 59:
> row 60:
> row 61:
> row 62:
> row 63:
> row 64:
> row 65:
> row 66:
> row 67:
> row 68:
> row 69:
> row 70:
> row 71:
> row 72:
> row 73:
> row 74:
>
> [image: Sent from Mailspring]


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210209/30630c83/attachment-0001.html>

From y.juntao at hotmail.com  Tue Feb  9 08:03:08 2021
From: y.juntao at hotmail.com (Karl Yang)
Date: Tue, 9 Feb 2021 22:03:08 +0800
Subject: [petsc-users] Help needed on DMDA dm_boundary_none and
 matsetvaluesstencil
In-Reply-To: <CAMYG4GnOV_jozjsRF-t=afNCdX++ekQoWtuVy8dB+dfCg6EV5g@mail.gmail.com>
References: <CAMYG4GnOV_jozjsRF-t=afNCdX++ekQoWtuVy8dB+dfCg6EV5g@mail.gmail.com>
Message-ID: <SI2PR03MB544996969924A700C96979B2938E9@SI2PR03MB5449.apcprd03.prod.outlook.com>

Hi, Matt

Thanks for your reply. I went around by direct creating matrix with local to global mapping. But now I think DM_BOUNDARY_GHOSTED should be just fine given the information.
Regards
Juntao
On Feb 9 2021, at 9:41 pm, Matthew Knepley <knepley at gmail.com> wrote:
> On Mon, Feb 8, 2021 at 11:50 PM Karl Yang <y.juntao at hotmail.com (https://link.getmailspring.com/link/E8FF16A6-B353-40C3-9F92-7DD517718AE1 at getmailspring.com/0?redirect=mailto%3Ay.juntao%40hotmail.com&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D)> wrote:
>
> > Hi, all
> >
> > I've encountered some issues with DM_BOUDNARY_NONE and MatSetValuesStencil. I had a code with DM_BOUNDARY_PERIODIC which was working fine. And I simply change the boundary, and find it not working any more. I wonder is there any difference in terms of indexing and stencil for DM_BOUNDARY_NONE.
>
> DM_BOUNDARY_PERIODIC puts another layer of ghost points around the local boundary. Values in these are then transferred to the correct global
> location when DMLocalToGlobal() is run. Also, DMGlobalToLocal() inserts values form the correct global locations into the local vector. DM_BOUNDARY_NONE
> does not do any of that.
>
> Thanks,
>
> Matt
>
> > The following is a simplified code to demonstrate what I was doing. It is basically assembling for finite elements.
> > But the MatSetValuesStencil seems not adding values into the matrix as I expected and some entries disappeared after MatSetValuesStencil for the second time. I've attached the output at the two different matview location. Some entries in the matrix disappeared after the second values add. And the order of matrix is wired to me, the matrix output is not in a ascending order. Appreciate if anyone would help.
> >
> > /////////////////////////demo code ///////////////////////////////////
> > DM dm;
> > Mat A;
> > MatStencil s_u[4];
> >
> > DMDACreate2d(PETSC_COMM_SELF, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DMDA_STENCIL_BOX, 5, 5, PETSC_DECIDE, PETSC_DECIDE, 3, 1, NULL, NULL, &dm);
> > DMSetMatType(dm, MATAIJ);
> > DMSetFromOptions(dm);
> > DMSetUp(dm);
> > DMDASetUniformCoordinates(dm, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0);
> >
> > DMSetMatrixPreallocateOnly(dm, PETSC_TRUE);
> > DMCreateMatrix(dm, &A);
> >
> > s_u[0].i = 0; s_u[0].j = 0; s_u[0].c = 1;
> > s_u[1].i = 0; s_u[1].j = 0+1; s_u[1].c = 1;
> > s_u[2].i = 0+1; s_u[2].j = 0+1; s_u[2].c = 1;
> > s_u[3].i = 0+1; s_u[3].j = 0; s_u[3].c = 1;
> >
> > double Ke[16];
> > for (int n=0;n<16;++n){Ke[n]=1;};
> > MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES);
> >
> > // MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> > // MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
> > // MatView(A, PETSC_VIEWER_STDOUT_WORLD); //first matview
> >
> > s_u[0].i = 1; s_u[0].j = 0; s_u[0].c = 1;
> > s_u[1].i = 1; s_u[1].j = 0+1; s_u[1].c = 1;
> > s_u[2].i = 1+1; s_u[2].j = 0+1; s_u[2].c = 1;
> > s_u[3].i = 1+1; s_u[3].j = 0; s_u[3].c = 1;
> >
> > MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES);
> > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
> > MatView(A, PETSC_VIEWER_STDOUT_WORLD); //second matview
> >
> >
> > ////////////////first matview ////////////////////////
> > row 0:
> > row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
> > row 2:
> > row 3:
> > row 4: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
> > row 5:
> > row 6:
> > row 7:
> > row 8:
> > row 9:
> > row 10:
> > row 11:
> > row 12:
> > row 13:
> > row 14:
> > row 15:
> > row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
> > row 17:
> > row 18:
> > row 19: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
> > row 20:
> > row 21:
> > row 22:
> > row 23:
> > row 24:
> > row 25:
> > row 26:
> > row 27:
> > row 28:
> > row 29:
> > row 30:
> > row 31:
> > row 32:
> > row 33:
> > row 34:
> > row 35:
> > row 36:
> > row 37:
> > row 38:
> > row 39:
> > row 40:
> > row 41:
> > row 42:
> > row 43:
> > row 44:
> > row 45:
> > row 46:
> > row 47:
> > row 48:
> > row 49:
> > row 50:
> > row 51:
> > row 52:
> > row 53:
> > row 54:
> > row 55:
> > row 56:
> > row 57:
> > row 58:
> > row 59:
> > row 60:
> > row 61:
> > row 62:
> > row 63:
> > row 64:
> > row 65:
> > row 66:
> > row 67:
> > row 68:
> > row 69:
> > row 70:
> > row 71:
> > row 72:
> > row 73:
> > row 74:
> >
> > ///////////////second matview/////////////////////
> > row 0:
> > row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
> > row 2:
> > row 3:
> > row 4: (4, 1.) (19, 1.) (22, 1.) (7, 1.)
> > row 5:
> > row 6:
> > row 7: (4, 1.) (19, 1.) (22, 1.) (7, 1.)
> > row 8:
> > row 9:
> > row 10:
> > row 11:
> > row 12:
> > row 13:
> > row 14:
> > row 15:
> > row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.)
> > row 17:
> > row 18:
> > row 19: (4, 1.) (19, 1.) (22, 1.) (7, 1.)
> > row 20:
> > row 21:
> > row 22: (4, 1.) (19, 1.) (22, 1.) (7, 1.)
> > row 23:
> > row 24:
> > row 25:
> > row 26:
> > row 27:
> > row 28:
> > row 29:
> > row 30:
> > row 31:
> > row 32:
> > row 33:
> > row 34:
> > row 35:
> > row 36:
> > row 37:
> > row 38:
> > row 39:
> > row 40:
> > row 41:
> > row 42:
> > row 43:
> > row 44:
> > row 45:
> > row 46:
> > row 47:
> > row 48:
> > row 49:
> > row 50:
> > row 51:
> > row 52:
> > row 53:
> > row 54:
> > row 55:
> > row 56:
> > row 57:
> > row 58:
> > row 59:
> > row 60:
> > row 61:
> > row 62:
> > row 63:
> > row 64:
> > row 65:
> > row 66:
> > row 67:
> > row 68:
> > row 69:
> > row 70:
> > row 71:
> > row 72:
> > row 73:
> > row 74:
> >
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
>
> https://www.cse.buffalo.edu/~knepley/ (https://link.getmailspring.com/link/E8FF16A6-B353-40C3-9F92-7DD517718AE1 at getmailspring.com/1?redirect=http%3A%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210209/fda6f9c6/attachment.html>

From edoardo.alinovi at gmail.com  Wed Feb 10 03:07:03 2021
From: edoardo.alinovi at gmail.com (Edoardo alinovi)
Date: Wed, 10 Feb 2021 10:07:03 +0100
Subject: [petsc-users] Using parmetis from petsc
Message-ID: <CADmAu6JWgbg-BFSS-4tWdmns3XNoRjrgfzkQqpi=oEiYJbWfPQ@mail.gmail.com>

Hello PETSc friends,

I am working on a code to partition a mesh in parallel and I am looking
at parmetis. As far as I know petsc is interfaced with metis and parmetis
and I have seen people using it within dmplex. Now, I am not using dmplex,
but  I have petsc compiled along with my code for the linear system part.
I am wondering if there is a way to load up a mesh file in the parmetis
format and use petsc to get the elements partitioning only in output. Is
that possible?

Thank you for the help,

Edoardo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210210/e92c9494/attachment.html>

From knepley at gmail.com  Wed Feb 10 06:12:18 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 10 Feb 2021 07:12:18 -0500
Subject: [petsc-users] Using parmetis from petsc
In-Reply-To: <CADmAu6JWgbg-BFSS-4tWdmns3XNoRjrgfzkQqpi=oEiYJbWfPQ@mail.gmail.com>
References: <CADmAu6JWgbg-BFSS-4tWdmns3XNoRjrgfzkQqpi=oEiYJbWfPQ@mail.gmail.com>
Message-ID: <CAMYG4GkZrPeENULxxMKc7LND6Chy2Bq=zM7DCDc4favxbPTPhw@mail.gmail.com>

On Wed, Feb 10, 2021 at 4:07 AM Edoardo alinovi <edoardo.alinovi at gmail.com>
wrote:

> Hello PETSc friends,
>
> I am working on a code to partition a mesh in parallel and I am looking
> at parmetis. As far as I know petsc is interfaced with metis and parmetis
> and I have seen people using it within dmplex. Now, I am not using dmplex,
> but  I have petsc compiled along with my code for the linear system part.
> I am wondering if there is a way to load up a mesh file in the parmetis
> format and use petsc to get the elements partitioning only in output. Is
> that possible?
>

ParMetis does not really have a mesh format. It partitions distributed
graphs. Most people want to partition cells in their mesh, and
then ParMetis would want the graph for cell connectivity. This is not
usually what people store, so typically there is a conversion process
here. If you want to use PETSc for partitioning, and not use DMPlex, the
easiest way to do it is to put your cell connectivity in a Mat,
storing the adjacency graph for cells. Then use MatPartitioning.

 Thanks,

     Matt


> Thank you for the help,
>
> Edoardo
>
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210210/de480852/attachment.html>

From edoardo.alinovi at gmail.com  Wed Feb 10 09:00:49 2021
From: edoardo.alinovi at gmail.com (Edoardo alinovi)
Date: Wed, 10 Feb 2021 16:00:49 +0100
Subject: [petsc-users] Using parmetis from petsc
In-Reply-To: <CAMYG4GkZrPeENULxxMKc7LND6Chy2Bq=zM7DCDc4favxbPTPhw@mail.gmail.com>
References: <CADmAu6JWgbg-BFSS-4tWdmns3XNoRjrgfzkQqpi=oEiYJbWfPQ@mail.gmail.com>
	<CAMYG4GkZrPeENULxxMKc7LND6Chy2Bq=zM7DCDc4favxbPTPhw@mail.gmail.com>
Message-ID: <CADmAu6KXb5hwYvfWPVu6Gjin8UxK-L+9neorqFHCGds0WKQ_nw@mail.gmail.com>

Thanks Matthew,

Probably in my case is a good idea to use parmetis directly as i just need
the cell distribution in pre-procrssing. Thanks for the clarification.

On an affine side, it's a while i am interrogating my self about a thing.
Let's say i have my cell distribution accross processors from parmetis.
Since petsc needs to have the unknows from 1 to N0 hosted by rank 0,
N0+1:N2 hosted by rank 2 and so on, what i am doing is relabelling local
cells in order to meet this requirement . Is that right? I am wondering if
such a way of things is leading to a suboptimal matrix bandwith. What do
you think about this?

Many thanks,

Edoardo

On Wed, 10 Feb 2021, 13:12 Matthew Knepley, <knepley at gmail.com> wrote:

> On Wed, Feb 10, 2021 at 4:07 AM Edoardo alinovi <edoardo.alinovi at gmail.com>
> wrote:
>
>> Hello PETSc friends,
>>
>> I am working on a code to partition a mesh in parallel and I am looking
>> at parmetis. As far as I know petsc is interfaced with metis and parmetis
>> and I have seen people using it within dmplex. Now, I am not using dmplex,
>> but  I have petsc compiled along with my code for the linear system part.
>> I am wondering if there is a way to load up a mesh file in the parmetis
>> format and use petsc to get the elements partitioning only in output. Is
>> that possible?
>>
>
> ParMetis does not really have a mesh format. It partitions distributed
> graphs. Most people want to partition cells in their mesh, and
> then ParMetis would want the graph for cell connectivity. This is not
> usually what people store, so typically there is a conversion process
> here. If you want to use PETSc for partitioning, and not use DMPlex, the
> easiest way to do it is to put your cell connectivity in a Mat,
> storing the adjacency graph for cells. Then use MatPartitioning.
>
>  Thanks,
>
>      Matt
>
>
>> Thank you for the help,
>>
>> Edoardo
>>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210210/376d80cb/attachment.html>

From knepley at gmail.com  Wed Feb 10 09:06:02 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 10 Feb 2021 10:06:02 -0500
Subject: [petsc-users] Using parmetis from petsc
In-Reply-To: <CADmAu6KXb5hwYvfWPVu6Gjin8UxK-L+9neorqFHCGds0WKQ_nw@mail.gmail.com>
References: <CADmAu6JWgbg-BFSS-4tWdmns3XNoRjrgfzkQqpi=oEiYJbWfPQ@mail.gmail.com>
	<CAMYG4GkZrPeENULxxMKc7LND6Chy2Bq=zM7DCDc4favxbPTPhw@mail.gmail.com>
	<CADmAu6KXb5hwYvfWPVu6Gjin8UxK-L+9neorqFHCGds0WKQ_nw@mail.gmail.com>
Message-ID: <CAMYG4GnFuHpukxEgN9Ow+mekYcbbmkMXZZbXcxDRMXEyzJxC5w@mail.gmail.com>

On Wed, Feb 10, 2021 at 10:00 AM Edoardo alinovi <edoardo.alinovi at gmail.com>
wrote:

> Thanks Matthew,
>
> Probably in my case is a good idea to use parmetis directly as i just need
> the cell distribution in pre-procrssing. Thanks for the clarification.
>
> On an affine side, it's a while i am interrogating my self about a thing.
> Let's say i have my cell distribution accross processors from parmetis.
> Since petsc needs to have the unknows from 1 to N0 hosted by rank 0,
> N0+1:N2 hosted by rank 2 and so on, what i am doing is relabelling local
> cells in order to meet this requirement . Is that right? I am wondering if
> such a way of things is leading to a suboptimal matrix bandwith. What do
> you think about this?
>

I do not understand. If parmetis gives you a partition, you would have to
move the data to match the new partition.

  Thanks,

    Matt


> Many thanks,
>
> Edoardo
>
> On Wed, 10 Feb 2021, 13:12 Matthew Knepley, <knepley at gmail.com> wrote:
>
>> On Wed, Feb 10, 2021 at 4:07 AM Edoardo alinovi <
>> edoardo.alinovi at gmail.com> wrote:
>>
>>> Hello PETSc friends,
>>>
>>> I am working on a code to partition a mesh in parallel and I am looking
>>> at parmetis. As far as I know petsc is interfaced with metis and parmetis
>>> and I have seen people using it within dmplex. Now, I am not using dmplex,
>>> but  I have petsc compiled along with my code for the linear system part.
>>> I am wondering if there is a way to load up a mesh file in the parmetis
>>> format and use petsc to get the elements partitioning only in output. Is
>>> that possible?
>>>
>>
>> ParMetis does not really have a mesh format. It partitions distributed
>> graphs. Most people want to partition cells in their mesh, and
>> then ParMetis would want the graph for cell connectivity. This is not
>> usually what people store, so typically there is a conversion process
>> here. If you want to use PETSc for partitioning, and not use DMPlex, the
>> easiest way to do it is to put your cell connectivity in a Mat,
>> storing the adjacency graph for cells. Then use MatPartitioning.
>>
>>  Thanks,
>>
>>      Matt
>>
>>
>>> Thank you for the help,
>>>
>>> Edoardo
>>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210210/53c939ff/attachment-0001.html>

From matteo.semplice at uninsubria.it  Wed Feb 10 10:46:57 2021
From: matteo.semplice at uninsubria.it (Matteo Semplice)
Date: Wed, 10 Feb 2021 17:46:57 +0100
Subject: [petsc-users] shell preconditioner for Schur complement
Message-ID: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>

Dear PETSc users,
 ??? we are trying to program a preconditioner for the Schur complement 
of a Stokes system, but it seems that the r.h.s. for the Schur 
complement system differs from what we expect by a scale factor, which 
we don't understand.

Our setup has a system matrix A divided in 2x2 blocks for velocity and 
pressure variables. We have programmed our preconditioner in a routine 
PrecondSchur and in the main program we do

PC pc;
KSPGetPC(kspA,&pc);
PCSetFromOptions(pc);
KSPSetOperators(kspA, A, A);
KSPSetInitialGuessNonzero(kspA,PETSC_FALSE);
KSPSetFromOptions(kspA);
KSP *subksp;
PetscInt nfield;
PCSetUp(pc);
PCFieldSplitGetSubKSP(pc, &nfield, &subksp);
PC pcSchur;
KSPGetPC(subksp[1],&pcSchur);
PCSetType(pcSchur,PCSHELL);
PCShellSetApply(pcSchur,PrecondSchur);
KSPSetFromOptions(subksp[1]);

and eventually

KSPSolve(A,b,solution);

We run the code with options

 ?-ksp_type fgmres \
 ?-pc_type fieldsplit -pc_fieldsplit_type schur \
 ?-pc_fieldsplit_schur_fact_type full \

and, from reading section 2.3.5 of the PETSc manual, we'd expect that 
the first r.h.s. passed to PrecondSchur be exactly
 ??? b_1-A_10*inv(A_00)*b_0

Instead (from a monitor function attached to the subksp[1] solver), the 
first r.h.s. appears to be scalar multiple of the above vector; we are 
guessing that we should take into account this multiplicative factor in 
our preconditioner routine, but we cannot understand where it comes from 
and how its value is determined.

Could you explain us what is going on in the PC_SCHUR exactly, or point 
us to some working code example?

Thanks in advance!

 ??? Matteo


From knepley at gmail.com  Wed Feb 10 11:05:28 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 10 Feb 2021 12:05:28 -0500
Subject: [petsc-users] shell preconditioner for Schur complement
In-Reply-To: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>
References: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>
Message-ID: <CAMYG4G=5OPaVtxvU+7h--sO1TOL3hjjxOFNtxTk0AXhCjGujag@mail.gmail.com>

On Wed, Feb 10, 2021 at 11:51 AM Matteo Semplice <
matteo.semplice at uninsubria.it> wrote:

> Dear PETSc users,
>      we are trying to program a preconditioner for the Schur complement
> of a Stokes system, but it seems that the r.h.s. for the Schur
> complement system differs from what we expect by a scale factor, which
> we don't understand.
>
> Our setup has a system matrix A divided in 2x2 blocks for velocity and
> pressure variables. We have programmed our preconditioner in a routine
> PrecondSchur and in the main program we do
>
> PC pc;
> KSPGetPC(kspA,&pc);
> PCSetFromOptions(pc);
> KSPSetOperators(kspA, A, A);
> KSPSetInitialGuessNonzero(kspA,PETSC_FALSE);
> KSPSetFromOptions(kspA);
> KSP *subksp;
> PetscInt nfield;
> PCSetUp(pc);
> PCFieldSplitGetSubKSP(pc, &nfield, &subksp);
> PC pcSchur;
> KSPGetPC(subksp[1],&pcSchur);
> PCSetType(pcSchur,PCSHELL);
> PCShellSetApply(pcSchur,PrecondSchur);
> KSPSetFromOptions(subksp[1]);
>
> and eventually
>
> KSPSolve(A,b,solution);
>
> We run the code with options
>
>   -ksp_type fgmres \
>   -pc_type fieldsplit -pc_fieldsplit_type schur \
>   -pc_fieldsplit_schur_fact_type full \
>
> and, from reading section 2.3.5 of the PETSc manual, we'd expect that
> the first r.h.s. passed to PrecondSchur be exactly
>      b_1-A_10*inv(A_00)*b_0
>
> Instead (from a monitor function attached to the subksp[1] solver), the
> first r.h.s. appears to be scalar multiple of the above vector; we are
> guessing that we should take into account this multiplicative factor in
> our preconditioner routine, but we cannot understand where it comes from
> and how its value is determined.
>
> Could you explain us what is going on in the PC_SCHUR exactly, or point
> us to some working code example?
>

1) It is hard to understand solver questions without the output of -ksp_view

2) The RHS will depend on the kind of factorization you are using for the
system


https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurFactType.html#PCFieldSplitSetSchurFactType

    I can see which one in the view output

  Thanks,

    Matt


> Thanks in advance!
>
>      Matteo
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210210/e3623db2/attachment.html>

From elena.travaglia at edu.unito.it  Wed Feb 10 15:05:23 2021
From: elena.travaglia at edu.unito.it (Elena Travaglia)
Date: Wed, 10 Feb 2021 22:05:23 +0100
Subject: [petsc-users] shell preconditioner for Schur complement
In-Reply-To: <CAMYG4G=5OPaVtxvU+7h--sO1TOL3hjjxOFNtxTk0AXhCjGujag@mail.gmail.com>
References: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>
	<CAMYG4G=5OPaVtxvU+7h--sO1TOL3hjjxOFNtxTk0AXhCjGujag@mail.gmail.com>
Message-ID: <CA+-JvF6mR0fBeqHFhEB=An2=ndL2J0+iyNKcHhpOQ7pTG74FJA@mail.gmail.com>

Thanks for the link.

We have set a Schur factorization of type FULL, and we passed it when we
run the code with
 -pc_fieldsplit_schur_fact_type full

Here there is the output of -ksp_view

KSP Object: 1 MPI processes
  type: fgmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization
with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: fieldsplit
    FieldSplit with Schur preconditioner, factorization FULL
    Preconditioner for the Schur complement formed from A11
    Split info:
    Split number 0 Defined by IS
    Split number 1 Defined by IS
    KSP solver for A00 block
      KSP Object: (fieldsplit_0_) 1 MPI processes
        type: gmres
          restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
          happy breakdown tolerance 1e-30
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using PRECONDITIONED norm type for convergence test
      PC Object: (fieldsplit_0_) 1 MPI processes
        type: ilu
          out-of-place factorization
          0 levels of fill
          tolerance for zero pivot 2.22045e-14
          matrix ordering: natural
          factor fill ratio given 1., needed 1.
            Factored matrix follows:
              Mat Object: 1 MPI processes
                type: seqaij
                rows=44, cols=44
                package used to perform factorization: petsc
                total: nonzeros=482, allocated nonzeros=482
                total number of mallocs used during MatSetValues calls=0
                  using I-node routines: found 13 nodes, limit used is 5
        linear system matrix = precond matrix:
        Mat Object: (fieldsplit_0_) 1 MPI processes
          type: seqaij
          rows=44, cols=44
          total: nonzeros=482, allocated nonzeros=482
          total number of mallocs used during MatSetValues calls=0
            using I-node routines: found 13 nodes, limit used is 5
    KSP solver for S = A11 - A10 inv(A00) A01
      KSP Object: (fieldsplit_1_) 1 MPI processes
        type: gmres
          restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
          happy breakdown tolerance 1e-30
        maximum iterations=1, initial guess is zero
        tolerances:  relative=1e-09, absolute=1e-50, divergence=10000.
        left preconditioning
        using PRECONDITIONED norm type for convergence test
      PC Object: (fieldsplit_1_) 1 MPI processes
        type: shell
          no name
        linear system matrix followed by preconditioner matrix:
        Mat Object: (fieldsplit_1_) 1 MPI processes
          type: schurcomplement
          rows=20, cols=20
            Schur complement A11 - A10 inv(A00) A01
            A11
              Mat Object: (fieldsplit_1_) 1 MPI processes
                type: seqaij
                rows=20, cols=20
                total: nonzeros=112, allocated nonzeros=112
                total number of mallocs used during MatSetValues calls=0
                  using I-node routines: found 10 nodes, limit used is 5
            A10
              Mat Object: 1 MPI processes
                type: seqaij
                rows=20, cols=44
                total: nonzeros=160, allocated nonzeros=160
                total number of mallocs used during MatSetValues calls=0
                  using I-node routines: found 10 nodes, limit used is 5
            KSP of A00
              KSP Object: (fieldsplit_0_) 1 MPI processes
                type: gmres
                  restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
                  happy breakdown tolerance 1e-30
                maximum iterations=10000, initial guess is zero
                tolerances:  relative=1e-05, absolute=1e-50,
divergence=10000.
                left preconditioning
                using PRECONDITIONED norm type for convergence test
              PC Object: (fieldsplit_0_) 1 MPI processes
                type: ilu
                  out-of-place factorization
                  0 levels of fill
                  tolerance for zero pivot 2.22045e-14
                  matrix ordering: natural
                  factor fill ratio given 1., needed 1.
                    Factored matrix follows:
                      Mat Object: 1 MPI processes
                        type: seqaij
                        rows=44, cols=44
                        package used to perform factorization: petsc
                        total: nonzeros=482, allocated nonzeros=482
                        total number of mallocs used during MatSetValues
calls=0
                          using I-node routines: found 13 nodes, limit used
is 5
                linear system matrix = precond matrix:
                Mat Object: (fieldsplit_0_) 1 MPI processes
                  type: seqaij
                  rows=44, cols=44
                  total: nonzeros=482, allocated nonzeros=482
                  total number of mallocs used during MatSetValues calls=0
                    using I-node routines: found 13 nodes, limit used is 5
            A01
              Mat Object: 1 MPI processes
                type: seqaij
                rows=44, cols=20
                total: nonzeros=156, allocated nonzeros=156
                total number of mallocs used during MatSetValues calls=0
                  using I-node routines: found 12 nodes, limit used is 5
        Mat Object: (fieldsplit_1_) 1 MPI processes
          type: seqaij
          rows=20, cols=20
          total: nonzeros=112, allocated nonzeros=112
          total number of mallocs used during MatSetValues calls=0
            using I-node routines: found 10 nodes, limit used is 5
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaij
    rows=64, cols=64
    total: nonzeros=910, allocated nonzeros=2432
    total number of mallocs used during MatSetValues calls=128
      using I-node routines: found 23 nodes, limit used is 5


We would like to understand why the first r.h.s, passed to our function for
the Schur preconditioner, is not
b_1-A_10*inv(A_00)*b_0,
even if we used the full factorization ( without dropping any terms ).

Thank you,
Elena


Il giorno mer 10 feb 2021 alle ore 18:05 Matthew Knepley <knepley at gmail.com>
ha scritto:

> On Wed, Feb 10, 2021 at 11:51 AM Matteo Semplice <
> matteo.semplice at uninsubria.it> wrote:
>
>> Dear PETSc users,
>>      we are trying to program a preconditioner for the Schur complement
>> of a Stokes system, but it seems that the r.h.s. for the Schur
>> complement system differs from what we expect by a scale factor, which
>> we don't understand.
>>
>> Our setup has a system matrix A divided in 2x2 blocks for velocity and
>> pressure variables. We have programmed our preconditioner in a routine
>> PrecondSchur and in the main program we do
>>
>> PC pc;
>> KSPGetPC(kspA,&pc);
>> PCSetFromOptions(pc);
>> KSPSetOperators(kspA, A, A);
>> KSPSetInitialGuessNonzero(kspA,PETSC_FALSE);
>> KSPSetFromOptions(kspA);
>> KSP *subksp;
>> PetscInt nfield;
>> PCSetUp(pc);
>> PCFieldSplitGetSubKSP(pc, &nfield, &subksp);
>> PC pcSchur;
>> KSPGetPC(subksp[1],&pcSchur);
>> PCSetType(pcSchur,PCSHELL);
>> PCShellSetApply(pcSchur,PrecondSchur);
>> KSPSetFromOptions(subksp[1]);
>>
>> and eventually
>>
>> KSPSolve(A,b,solution);
>>
>> We run the code with options
>>
>>   -ksp_type fgmres \
>>   -pc_type fieldsplit -pc_fieldsplit_type schur \
>>   -pc_fieldsplit_schur_fact_type full \
>>
>> and, from reading section 2.3.5 of the PETSc manual, we'd expect that
>> the first r.h.s. passed to PrecondSchur be exactly
>>      b_1-A_10*inv(A_00)*b_0
>>
>> Instead (from a monitor function attached to the subksp[1] solver), the
>> first r.h.s. appears to be scalar multiple of the above vector; we are
>> guessing that we should take into account this multiplicative factor in
>> our preconditioner routine, but we cannot understand where it comes from
>> and how its value is determined.
>>
>> Could you explain us what is going on in the PC_SCHUR exactly, or point
>> us to some working code example?
>>
>
> 1) It is hard to understand solver questions without the output of
> -ksp_view
>
> 2) The RHS will depend on the kind of factorization you are using for the
> system
>
>
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurFactType.html#PCFieldSplitSetSchurFactType
>
>     I can see which one in the view output
>
>   Thanks,
>
>     Matt
>
>
>> Thanks in advance!
>>
>>      Matteo
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>

-- 
------------------------


Indirizzo istituzionale di posta elettronica 
degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial?
University of Turin?email address?for students and graduates?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210210/705208ce/attachment.html>

From knepley at gmail.com  Wed Feb 10 15:23:05 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 10 Feb 2021 16:23:05 -0500
Subject: [petsc-users] shell preconditioner for Schur complement
In-Reply-To: <CA+-JvF6mR0fBeqHFhEB=An2=ndL2J0+iyNKcHhpOQ7pTG74FJA@mail.gmail.com>
References: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>
	<CAMYG4G=5OPaVtxvU+7h--sO1TOL3hjjxOFNtxTk0AXhCjGujag@mail.gmail.com>
	<CA+-JvF6mR0fBeqHFhEB=An2=ndL2J0+iyNKcHhpOQ7pTG74FJA@mail.gmail.com>
Message-ID: <CAMYG4GmREgywkooOEarHGZSarvnoctXEi7onx+Axq-g+7xqK3w@mail.gmail.com>

On Wed, Feb 10, 2021 at 4:05 PM Elena Travaglia <
elena.travaglia at edu.unito.it> wrote:

> Thanks for the link.
>
> We have set a Schur factorization of type FULL, and we passed it when we
> run the code with
>  -pc_fieldsplit_schur_fact_type full
>
> Here there is the output of -ksp_view
>
> KSP Object: 1 MPI processes
>   type: fgmres
>     restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>     happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
>   right preconditioning
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: fieldsplit
>     FieldSplit with Schur preconditioner, factorization FULL
>     Preconditioner for the Schur complement formed from A11
>     Split info:
>     Split number 0 Defined by IS
>     Split number 1 Defined by IS
>     KSP solver for A00 block
>       KSP Object: (fieldsplit_0_) 1 MPI processes
>         type: gmres
>           restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>           happy breakdown tolerance 1e-30
>         maximum iterations=10000, initial guess is zero
>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>         left preconditioning
>         using PRECONDITIONED norm type for convergence test
>       PC Object: (fieldsplit_0_) 1 MPI processes
>         type: ilu
>           out-of-place factorization
>           0 levels of fill
>           tolerance for zero pivot 2.22045e-14
>           matrix ordering: natural
>           factor fill ratio given 1., needed 1.
>             Factored matrix follows:
>               Mat Object: 1 MPI processes
>                 type: seqaij
>                 rows=44, cols=44
>                 package used to perform factorization: petsc
>                 total: nonzeros=482, allocated nonzeros=482
>                 total number of mallocs used during MatSetValues calls=0
>                   using I-node routines: found 13 nodes, limit used is 5
>         linear system matrix = precond matrix:
>         Mat Object: (fieldsplit_0_) 1 MPI processes
>           type: seqaij
>           rows=44, cols=44
>           total: nonzeros=482, allocated nonzeros=482
>           total number of mallocs used during MatSetValues calls=0
>             using I-node routines: found 13 nodes, limit used is 5
>     KSP solver for S = A11 - A10 inv(A00) A01
>       KSP Object: (fieldsplit_1_) 1 MPI processes
>         type: gmres
>           restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>           happy breakdown tolerance 1e-30
>         maximum iterations=1, initial guess is zero
>         tolerances:  relative=1e-09, absolute=1e-50, divergence=10000.
>         left preconditioning
>         using PRECONDITIONED norm type for convergence test
>       PC Object: (fieldsplit_1_) 1 MPI processes
>         type: shell
>           no name
>         linear system matrix followed by preconditioner matrix:
>         Mat Object: (fieldsplit_1_) 1 MPI processes
>           type: schurcomplement
>           rows=20, cols=20
>             Schur complement A11 - A10 inv(A00) A01
>             A11
>               Mat Object: (fieldsplit_1_) 1 MPI processes
>                 type: seqaij
>                 rows=20, cols=20
>                 total: nonzeros=112, allocated nonzeros=112
>                 total number of mallocs used during MatSetValues calls=0
>                   using I-node routines: found 10 nodes, limit used is 5
>             A10
>               Mat Object: 1 MPI processes
>                 type: seqaij
>                 rows=20, cols=44
>                 total: nonzeros=160, allocated nonzeros=160
>                 total number of mallocs used during MatSetValues calls=0
>                   using I-node routines: found 10 nodes, limit used is 5
>             KSP of A00
>               KSP Object: (fieldsplit_0_) 1 MPI processes
>                 type: gmres
>                   restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>                   happy breakdown tolerance 1e-30
>                 maximum iterations=10000, initial guess is zero
>                 tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
>                 left preconditioning
>                 using PRECONDITIONED norm type for convergence test
>               PC Object: (fieldsplit_0_) 1 MPI processes
>                 type: ilu
>                   out-of-place factorization
>                   0 levels of fill
>                   tolerance for zero pivot 2.22045e-14
>                   matrix ordering: natural
>                   factor fill ratio given 1., needed 1.
>                     Factored matrix follows:
>                       Mat Object: 1 MPI processes
>                         type: seqaij
>                         rows=44, cols=44
>                         package used to perform factorization: petsc
>                         total: nonzeros=482, allocated nonzeros=482
>                         total number of mallocs used during MatSetValues
> calls=0
>                           using I-node routines: found 13 nodes, limit
> used is 5
>                 linear system matrix = precond matrix:
>                 Mat Object: (fieldsplit_0_) 1 MPI processes
>                   type: seqaij
>                   rows=44, cols=44
>                   total: nonzeros=482, allocated nonzeros=482
>                   total number of mallocs used during MatSetValues calls=0
>                     using I-node routines: found 13 nodes, limit used is 5
>             A01
>               Mat Object: 1 MPI processes
>                 type: seqaij
>                 rows=44, cols=20
>                 total: nonzeros=156, allocated nonzeros=156
>                 total number of mallocs used during MatSetValues calls=0
>                   using I-node routines: found 12 nodes, limit used is 5
>         Mat Object: (fieldsplit_1_) 1 MPI processes
>           type: seqaij
>           rows=20, cols=20
>           total: nonzeros=112, allocated nonzeros=112
>           total number of mallocs used during MatSetValues calls=0
>             using I-node routines: found 10 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
>     type: seqaij
>     rows=64, cols=64
>     total: nonzeros=910, allocated nonzeros=2432
>     total number of mallocs used during MatSetValues calls=128
>       using I-node routines: found 23 nodes, limit used is 5
>
>
> We would like to understand why the first r.h.s, passed to our function
> for the Schur preconditioner, is not
> b_1-A_10*inv(A_00)*b_0,
> even if we used the full factorization ( without dropping any terms ).
>

Here is the code:

https://gitlab.com/petsc/petsc/-/blob/master/src/ksp/pc/impls/fieldsplit/fieldsplit.c#L1182

I think you are saying that ilinkD->x is not what you expect on line 1196.
It should be easy to print out the value
at any of the intermediate stages.

  Thanks,

     Matt


> Thank you,
> Elena
>
>
>
>
> Il giorno mer 10 feb 2021 alle ore 18:05 Matthew Knepley <
> knepley at gmail.com> ha scritto:
>
>> On Wed, Feb 10, 2021 at 11:51 AM Matteo Semplice <
>> matteo.semplice at uninsubria.it> wrote:
>>
>>> Dear PETSc users,
>>>      we are trying to program a preconditioner for the Schur complement
>>> of a Stokes system, but it seems that the r.h.s. for the Schur
>>> complement system differs from what we expect by a scale factor, which
>>> we don't understand.
>>>
>>> Our setup has a system matrix A divided in 2x2 blocks for velocity and
>>> pressure variables. We have programmed our preconditioner in a routine
>>> PrecondSchur and in the main program we do
>>>
>>> PC pc;
>>> KSPGetPC(kspA,&pc);
>>> PCSetFromOptions(pc);
>>> KSPSetOperators(kspA, A, A);
>>> KSPSetInitialGuessNonzero(kspA,PETSC_FALSE);
>>> KSPSetFromOptions(kspA);
>>> KSP *subksp;
>>> PetscInt nfield;
>>> PCSetUp(pc);
>>> PCFieldSplitGetSubKSP(pc, &nfield, &subksp);
>>> PC pcSchur;
>>> KSPGetPC(subksp[1],&pcSchur);
>>> PCSetType(pcSchur,PCSHELL);
>>> PCShellSetApply(pcSchur,PrecondSchur);
>>> KSPSetFromOptions(subksp[1]);
>>>
>>> and eventually
>>>
>>> KSPSolve(A,b,solution);
>>>
>>> We run the code with options
>>>
>>>   -ksp_type fgmres \
>>>   -pc_type fieldsplit -pc_fieldsplit_type schur \
>>>   -pc_fieldsplit_schur_fact_type full \
>>>
>>> and, from reading section 2.3.5 of the PETSc manual, we'd expect that
>>> the first r.h.s. passed to PrecondSchur be exactly
>>>      b_1-A_10*inv(A_00)*b_0
>>>
>>> Instead (from a monitor function attached to the subksp[1] solver), the
>>> first r.h.s. appears to be scalar multiple of the above vector; we are
>>> guessing that we should take into account this multiplicative factor in
>>> our preconditioner routine, but we cannot understand where it comes from
>>> and how its value is determined.
>>>
>>> Could you explain us what is going on in the PC_SCHUR exactly, or point
>>> us to some working code example?
>>>
>>
>> 1) It is hard to understand solver questions without the output of
>> -ksp_view
>>
>> 2) The RHS will depend on the kind of factorization you are using for the
>> system
>>
>>
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurFactType.html#PCFieldSplitSetSchurFactType
>>
>>     I can see which one in the view output
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> Thanks in advance!
>>>
>>>      Matteo
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
> ------------------------
>
> Indirizzo istituzionale di posta elettronica degli studenti e dei laureati
> dell'Universit? degli Studi di Torino
> Official University of Turin email address for students and graduates
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210210/981277fe/attachment-0001.html>

From bsmith at petsc.dev  Wed Feb 10 23:41:41 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 10 Feb 2021 23:41:41 -0600
Subject: [petsc-users] shell preconditioner for Schur complement
In-Reply-To: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>
References: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>
Message-ID: <227EB1C9-E4FC-49D6-8376-52B6CC850F2B@petsc.dev>


  Best to just look at the code to see exactly what it is doing: src/ksp/pc/impls/fieldsplit/fieldsplit.c  function PCApply_FieldSplit_Schur()

  There is no particular "scaling" applied to the vectors. It might be easiest to track through the computational process with the debugger (you can call VecView(v,0) in the debugger anytime to see the current vector) to see why the "scaling" seems to change.

  Barry


> On Feb 10, 2021, at 10:46 AM, Matteo Semplice <matteo.semplice at uninsubria.it> wrote:
> 
> Dear PETSc users,
>     we are trying to program a preconditioner for the Schur complement of a Stokes system, but it seems that the r.h.s. for the Schur complement system differs from what we expect by a scale factor, which we don't understand.
> 
> Our setup has a system matrix A divided in 2x2 blocks for velocity and pressure variables. We have programmed our preconditioner in a routine PrecondSchur and in the main program we do
> 
> PC pc;
> KSPGetPC(kspA,&pc);
> PCSetFromOptions(pc);
> KSPSetOperators(kspA, A, A);
> KSPSetInitialGuessNonzero(kspA,PETSC_FALSE);
> KSPSetFromOptions(kspA);
> KSP *subksp;
> PetscInt nfield;
> PCSetUp(pc);
> PCFieldSplitGetSubKSP(pc, &nfield, &subksp);
> PC pcSchur;
> KSPGetPC(subksp[1],&pcSchur);
> PCSetType(pcSchur,PCSHELL);
> PCShellSetApply(pcSchur,PrecondSchur);
> KSPSetFromOptions(subksp[1]);
> 
> and eventually
> 
> KSPSolve(A,b,solution);
> 
> We run the code with options
> 
>  -ksp_type fgmres \
>  -pc_type fieldsplit -pc_fieldsplit_type schur \
>  -pc_fieldsplit_schur_fact_type full \
> 
> and, from reading section 2.3.5 of the PETSc manual, we'd expect that the first r.h.s. passed to PrecondSchur be exactly
>     b_1-A_10*inv(A_00)*b_0
> 
> Instead (from a monitor function attached to the subksp[1] solver), the first r.h.s. appears to be scalar multiple of the above vector; we are guessing that we should take into account this multiplicative factor in our preconditioner routine, but we cannot understand where it comes from and how its value is determined.
> 
> Could you explain us what is going on in the PC_SCHUR exactly, or point us to some working code example?
> 
> Thanks in advance!
> 
>     Matteo
> 


From matteo.semplice at uninsubria.it  Thu Feb 11 09:52:31 2021
From: matteo.semplice at uninsubria.it (Matteo Semplice)
Date: Thu, 11 Feb 2021 16:52:31 +0100
Subject: [petsc-users] shell preconditioner for Schur complement
In-Reply-To: <227EB1C9-E4FC-49D6-8376-52B6CC850F2B@petsc.dev>
References: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>
	<227EB1C9-E4FC-49D6-8376-52B6CC850F2B@petsc.dev>
Message-ID: <4af4cca9-e3f3-d0be-6375-2f2f0a3aef4b@uninsubria.it>


Il 11/02/21 06:41, Barry Smith ha scritto:
>    Best to just look at the code to see exactly what it is doing: src/ksp/pc/impls/fieldsplit/fieldsplit.c  function PCApply_FieldSplit_Schur()
>
>    There is no particular "scaling" applied to the vectors. It might be easiest to track through the computational process with the debugger (you can call VecView(v,0) in the debugger anytime to see the current vector) to see why the "scaling" seems to change.
>
>    Barry

Found!

It came from the initial rescaling b->b/norm(b) in the fgmres for the 
entire matrix.

We are now set and we can concentrate on our routine.

Thanks a lot Matthew and Barry!

Matteo & Elena

-- 
Prof. Matteo Semplice
Universit? degli Studi dell?Insubria
Dipartimento di Scienza e Alta Tecnologia ? DiSAT
Professore Associato
Via Valleggio, 11 ? 22100 Como (CO) ? Italia
tel.: +39 031 2386316


From knepley at gmail.com  Thu Feb 11 15:12:22 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 11 Feb 2021 16:12:22 -0500
Subject: [petsc-users] shell preconditioner for Schur complement
In-Reply-To: <4af4cca9-e3f3-d0be-6375-2f2f0a3aef4b@uninsubria.it>
References: <ee3bcf4f-79c0-62b9-23e2-06857ca2d64e@uninsubria.it>
	<227EB1C9-E4FC-49D6-8376-52B6CC850F2B@petsc.dev>
	<4af4cca9-e3f3-d0be-6375-2f2f0a3aef4b@uninsubria.it>
Message-ID: <CAMYG4GnokK1mye7wLnvQVyWGO9wtm7JJrGUoNFHKZqeswnw4QA@mail.gmail.com>

On Thu, Feb 11, 2021 at 10:53 AM Matteo Semplice <
matteo.semplice at uninsubria.it> wrote:

>
> Il 11/02/21 06:41, Barry Smith ha scritto:
> >    Best to just look at the code to see exactly what it is doing:
> src/ksp/pc/impls/fieldsplit/fieldsplit.c  function
> PCApply_FieldSplit_Schur()
> >
> >    There is no particular "scaling" applied to the vectors. It might be
> easiest to track through the computational process with the debugger (you
> can call VecView(v,0) in the debugger anytime to see the current vector) to
> see why the "scaling" seems to change.
> >
> >    Barry
>
> Found!
>
> It came from the initial rescaling b->b/norm(b) in the fgmres for the
> entire matrix.
>

I see. I did not document it because it is internal to the solver, but the
subsolvers can see it :)
Glad you got it worked out.

 Thanks,

    Matt


> We are now set and we can concentrate on our routine.
>
> Thanks a lot Matthew and Barry!
>
> Matteo & Elena
>
> --
> Prof. Matteo Semplice
> Universit? degli Studi dell?Insubria
> Dipartimento di Scienza e Alta Tecnologia ? DiSAT
> Professore Associato
> Via Valleggio, 11 ? 22100 Como (CO) ? Italia
> tel.: +39 031 2386316
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210211/3bbb054a/attachment.html>

From e0425375 at gmail.com  Fri Feb 12 02:32:26 2021
From: e0425375 at gmail.com (Florian Bruckner)
Date: Fri, 12 Feb 2021 09:32:26 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
Message-ID: <CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>

Dear Jose, Dear Matt,

I needed some time to think about your answers.
If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0,
which is normally handled by the ST object, which creates a KSP solver and
a corresponding preconditioner.
What I would need is an interface to provide not only the system Matrix A0
(which is an operator), but also a preconditioning matrix (sparse
approximation of the operator).
Unfortunately this interface is not available, right?

Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses
a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and
solve a standard eigenvalue problem with this modified operator. Did I
understand this correctly?

I have two further points, which I did not mention yet: the matrix B0 is
Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using
Firedrake to set up the PETSc system matrices A0, i*B0 (which is real).
Then I convert them into ScipyLinearOperators and use
scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues.
Minv=A0^-1 is also solving within scipy using a preconditioned gmres.
Advantage of this setup is that the imaginary B0 can be handled efficiently
and also the post-processing of the eigenvectors (which requires complex
arithmetics) is simplified.

Nevertheless I think that the mixing of PETSc and Scipy looks too
complicated and is not very flexible.
If I would use Matt's approach, could I then simply switch between multiple
standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use
of matshell?
Is there a solution for the imaginary B0, or do I have to use the
non-hermitian methods? Is this a large performance drawback?

thanks again,
and best wishes
Florian

On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es> wrote:

> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues
> omega closest to zero. If the matrices were explicitly available, you would
> do shift-and-invert with target=0, that is
>
>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>
>   A0^{-1}*B0*v=theta*v
>
> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>
> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
> EPS_SMALLEST_REAL will give slow convergence.
>
> Florian: I would not recommend setting the KSP matrices directly, it may
> produce strange side-effects. We should have an interface function to pass
> this matrix. Currently there is STPrecondSetMatForPC() but it has two
> problems: (1) it is intended for STPRECOND, so cannot be used with
> Krylov-Schur, and (2) it is not currently available in the python interface.
>
> The approach used by Matt is a workaround that does not use ST, so you can
> handle linear solves with a KSP of your own.
>
> As an alternative, since your problem is symmetric, you could try LOBPCG,
> assuming that the leftmost eigenvalues are those that you want (e.g. if all
> eigenvalues are non-negative). In that case you could use
> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>
> If you are using the git repo, I could add the relevant code.
>
> Jose
>
>
>
> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
> escribi?:
> >
> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com>
> wrote:
> > Dear PETSc / SLEPc Users,
> >
> > my question is very similar to the one posted here:
> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
> >
> > The eigensystem I would like to solve looks like:
> > B0 v = 1/omega A0 v
> > B0 and A0 are both hermitian, A0 is positive definite, but only given as
> a linear operator (matshell). I am looking for the largest eigenvalues
> (=smallest omega).
> >
> > I also have a sparse approximation P0 of the A0 operator, which i would
> like to use as precondtioner, using something like this:
> >
> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >         st = es.getST()
> >         ksp = st.getKSP()
> >         ksp.setOperators(self.A0, self.P0)
> >
> > Unfortunately PETSc still complains that it cannot create a
> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
> A0.type == 'python').
> > By the way, should P0 be an approximation of A0 or does it have to
> include B0?
> >
> > Right now I am using the krylov-schur method. Are there any alternatives
> if A0 is only given as an operator?
> >
> > Jose can correct me if I say something wrong.
> >
> > When I did this, I made a shell operator for the action of A0^{-1} B0
> which has a KSPSolve() in it, so you can use your P0 preconditioning
> matrix, and
> > then handed that to EPS. You can see me do it here:
> >
> >
> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
> >
> > I had a hard time getting the embedded solver to work the way I wanted,
> but maybe that is the better way.
> >
> >   Thanks,
> >
> >      Matt
> >
> > thanks for any advice
> > best wishes
> > Florian
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210212/3041f3cf/attachment.html>

From jroman at dsic.upv.es  Fri Feb 12 03:12:18 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Fri, 12 Feb 2021 10:12:18 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
Message-ID: <848D4FF0-7ABD-4D0A-93FF-71318A3C966D@dsic.upv.es>


> El 12 feb 2021, a las 9:32, Florian Bruckner <e0425375 at gmail.com> escribi?:
> 
> Dear Jose, Dear Matt, 
> 
> I needed some time to think about your answers. 
> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
> Unfortunately this interface is not available, right?

Yes, when using shift-and-invert with target=0 the solver internally uses A0^{-1}*B0. It also uses the B0-inner product to preserve symmetry.

> 
> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?

Yes, the difference here is that you have to solve the problem as non-symmetric. This is not going to be slower, maybe just a bit less accurate. And the computed eigenvectors will not be B0-orthogonal.

> 
> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
> 
> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?

LOBPCG can be used only in Hermitian problems, see Table 2.4 in the users manual.

The problem A0*v=omega*(i*B0)*v can be solved as A0*v=(i*omega)*B0*v, no problem with that. Just solve it as non-Hermitian. As I said above, the computational cost should not be too different. But solving as non-Hermitian restricts the available solvers.

Jose

> 
> thanks again,
> and best wishes
> Florian
> 
> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
> 
>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
> 
>   A0^{-1}*B0*v=theta*v
> 
> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
> 
> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
> 
> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
> 
> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
> 
> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
> 
> If you are using the git repo, I could add the relevant code.
> 
> Jose
> 
> 
> 
> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com> escribi?:
> > 
> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com> wrote:
> > Dear PETSc / SLEPc Users,
> > 
> > my question is very similar to the one posted here: 
> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
> > 
> > The eigensystem I would like to solve looks like:
> > B0 v = 1/omega A0 v
> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
> > 
> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
> > 
> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >         st = es.getST()
> >         ksp = st.getKSP()
> >         ksp.setOperators(self.A0, self.P0)
> > 
> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
> > By the way, should P0 be an approximation of A0 or does it have to include B0?
> > 
> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
> > 
> > Jose can correct me if I say something wrong.
> > 
> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
> > then handed that to EPS. You can see me do it here:
> > 
> >   https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
> > 
> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
> > 
> >   Thanks,
> > 
> >      Matt
> >  
> > thanks for any advice
> > best wishes
> > Florian
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 


From bsmith at petsc.dev  Fri Feb 12 21:19:50 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 12 Feb 2021 21:19:50 -0600
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
Message-ID: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>


> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
> 
> Dear Jose, Dear Matt, 
> 
> I needed some time to think about your answers. 
> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
> Unfortunately this interface is not available, right?

   If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . 

  Barry

> 
> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?
> 
> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
> 
> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?
> 
> thanks again,
> and best wishes
> Florian
> 
> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es <mailto:jroman at dsic.upv.es>> wrote:
> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
> 
>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
> 
>   A0^{-1}*B0*v=theta*v
> 
> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
> 
> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
> 
> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
> 
> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
> 
> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
> 
> If you are using the git repo, I could add the relevant code.
> 
> Jose
> 
> 
> 
> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> escribi?:
> > 
> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
> > Dear PETSc / SLEPc Users,
> > 
> > my question is very similar to the one posted here: 
> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html <https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html>
> > 
> > The eigensystem I would like to solve looks like:
> > B0 v = 1/omega A0 v
> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
> > 
> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
> > 
> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >         st = es.getST()
> >         ksp = st.getKSP()
> >         ksp.setOperators(self.A0, self.P0)
> > 
> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
> > By the way, should P0 be an approximation of A0 or does it have to include B0?
> > 
> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
> > 
> > Jose can correct me if I say something wrong.
> > 
> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
> > then handed that to EPS. You can see me do it here:
> > 
> >   https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 <https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123>
> > 
> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
> > 
> >   Thanks,
> > 
> >      Matt
> >  
> > thanks for any advice
> > best wishes
> > Florian
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/ <https://www.cse.buffalo.edu/~knepley/>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210212/ab597d70/attachment.html>

From e0425375 at gmail.com  Sat Feb 13 00:25:58 2021
From: e0425375 at gmail.com (Florian Bruckner)
Date: Sat, 13 Feb 2021 07:25:58 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
Message-ID: <CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>

Dear Jose, Dear Barry,
thanks again for your reply. One final question about the B0 orthogonality.
Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0
orthogonal? or is there an issue with Matt's approach?
For my problem I can show that eigenvalues fulfill an orthogonality
relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This
should be independent of the solving method, right?

Regarding Barry's advice this is what I first tried:
es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
st = es.getST()
ksp = st.getKSP()
ksp.setOperators(self.A0, self.P0)

But it seems that the provided P0 is not used. Furthermore the interface is
maybe a bit confusing if ST performs some transformation. In this case P0
needs to approximate A0^{-1}*B0 and not A0, right?
Nevertheless I think it would be the best solution if one could provide P0
(approx A0) and SLEPc derives the preconditioner from this. Would this be
hard to implement?

best wishes
Florian


On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
>
> Dear Jose, Dear Matt,
>
> I needed some time to think about your answers.
> If I understand correctly, the eigenmode solver internally uses
> A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP
> solver and a corresponding preconditioner.
> What I would need is an interface to provide not only the system Matrix A0
> (which is an operator), but also a preconditioning matrix (sparse
> approximation of the operator).
> Unfortunately this interface is not available, right?
>
>
>    If SLEPc does not provide this directly it is still intended to be
> trivial to provide the "preconditioner matrix" (that is matrix from which
> the preconditioner is built). Just get the KSP from the ST object and use
> KSPSetOperators() to provide the "preconditioner matrix" .
>
>   Barry
>
>
> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses
> a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and
> solve a standard eigenvalue problem with this modified operator. Did I
> understand this correctly?
>
> I have two further points, which I did not mention yet: the matrix B0 is
> Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using
> Firedrake to set up the PETSc system matrices A0, i*B0 (which is real).
> Then I convert them into ScipyLinearOperators and use
> scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues.
> Minv=A0^-1 is also solving within scipy using a preconditioned gmres.
> Advantage of this setup is that the imaginary B0 can be handled efficiently
> and also the post-processing of the eigenvectors (which requires complex
> arithmetics) is simplified.
>
> Nevertheless I think that the mixing of PETSc and Scipy looks too
> complicated and is not very flexible.
> If I would use Matt's approach, could I then simply switch between
> multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to
> the use of matshell?
> Is there a solution for the imaginary B0, or do I have to use the
> non-hermitian methods? Is this a large performance drawback?
>
> thanks again,
> and best wishes
> Florian
>
> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
>
>> The problem can be written as A0*v=omega*B0*v and you want the
>> eigenvalues omega closest to zero. If the matrices were explicitly
>> available, you would do shift-and-invert with target=0, that is
>>
>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>
>>   A0^{-1}*B0*v=theta*v
>>
>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>
>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
>> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
>> EPS_SMALLEST_REAL will give slow convergence.
>>
>> Florian: I would not recommend setting the KSP matrices directly, it may
>> produce strange side-effects. We should have an interface function to pass
>> this matrix. Currently there is STPrecondSetMatForPC() but it has two
>> problems: (1) it is intended for STPRECOND, so cannot be used with
>> Krylov-Schur, and (2) it is not currently available in the python interface.
>>
>> The approach used by Matt is a workaround that does not use ST, so you
>> can handle linear solves with a KSP of your own.
>>
>> As an alternative, since your problem is symmetric, you could try LOBPCG,
>> assuming that the leftmost eigenvalues are those that you want (e.g. if all
>> eigenvalues are non-negative). In that case you could use
>> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>
>> If you are using the git repo, I could add the relevant code.
>>
>> Jose
>>
>>
>>
>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
>> escribi?:
>> >
>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com>
>> wrote:
>> > Dear PETSc / SLEPc Users,
>> >
>> > my question is very similar to the one posted here:
>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
>> >
>> > The eigensystem I would like to solve looks like:
>> > B0 v = 1/omega A0 v
>> > B0 and A0 are both hermitian, A0 is positive definite, but only given
>> as a linear operator (matshell). I am looking for the largest eigenvalues
>> (=smallest omega).
>> >
>> > I also have a sparse approximation P0 of the A0 operator, which i would
>> like to use as precondtioner, using something like this:
>> >
>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>> >         st = es.getST()
>> >         ksp = st.getKSP()
>> >         ksp.setOperators(self.A0, self.P0)
>> >
>> > Unfortunately PETSc still complains that it cannot create a
>> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
>> A0.type == 'python').
>> > By the way, should P0 be an approximation of A0 or does it have to
>> include B0?
>> >
>> > Right now I am using the krylov-schur method. Are there any
>> alternatives if A0 is only given as an operator?
>> >
>> > Jose can correct me if I say something wrong.
>> >
>> > When I did this, I made a shell operator for the action of A0^{-1} B0
>> which has a KSPSolve() in it, so you can use your P0 preconditioning
>> matrix, and
>> > then handed that to EPS. You can see me do it here:
>> >
>> >
>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
>> >
>> > I had a hard time getting the embedded solver to work the way I wanted,
>> but maybe that is the better way.
>> >
>> >   Thanks,
>> >
>> >      Matt
>> >
>> > thanks for any advice
>> > best wishes
>> > Florian
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> > https://www.cse.buffalo.edu/~knepley/
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210213/97efad6e/attachment-0001.html>

From pierre at joliv.et  Sat Feb 13 02:47:40 2021
From: pierre at joliv.et (Pierre Jolivet)
Date: Sat, 13 Feb 2021 09:47:40 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
Message-ID: <B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>


> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
> 
> Dear Jose, Dear Barry,
> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach?
> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right?
> 
> Regarding Barry's advice this is what I first tried:
> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> st = es.getST()
> ksp = st.getKSP()
> ksp.setOperators(self.A0, self.P0)
> 
> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right?

No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1.

> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement?

This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though.

Thanks,
Pierre

> best wishes
> Florian
> 
> 
> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
> 
>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
>> 
>> Dear Jose, Dear Matt, 
>> 
>> I needed some time to think about your answers. 
>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
>> Unfortunately this interface is not available, right?
> 
>    If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . 
> 
>   Barry
> 
>> 
>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?
>> 
>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
>> 
>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?
>> 
>> thanks again,
>> and best wishes
>> Florian
>> 
>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es <mailto:jroman at dsic.upv.es>> wrote:
>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
>> 
>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>> 
>>   A0^{-1}*B0*v=theta*v
>> 
>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>> 
>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
>> 
>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
>> 
>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
>> 
>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>> 
>> If you are using the git repo, I could add the relevant code.
>> 
>> Jose
>> 
>> 
>> 
>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> escribi?:
>> > 
>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
>> > Dear PETSc / SLEPc Users,
>> > 
>> > my question is very similar to the one posted here: 
>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html <https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html>
>> > 
>> > The eigensystem I would like to solve looks like:
>> > B0 v = 1/omega A0 v
>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
>> > 
>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
>> > 
>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>> >         st = es.getST()
>> >         ksp = st.getKSP()
>> >         ksp.setOperators(self.A0, self.P0)
>> > 
>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
>> > By the way, should P0 be an approximation of A0 or does it have to include B0?
>> > 
>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
>> > 
>> > Jose can correct me if I say something wrong.
>> > 
>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
>> > then handed that to EPS. You can see me do it here:
>> > 
>> >   https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 <https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123>
>> > 
>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
>> > 
>> >   Thanks,
>> > 
>> >      Matt
>> >  
>> > thanks for any advice
>> > best wishes
>> > Florian
>> > 
>> > 
>> > -- 
>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> > -- Norbert Wiener
>> > 
>> > https://www.cse.buffalo.edu/~knepley/ <https://www.cse.buffalo.edu/~knepley/>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210213/93cbce6c/attachment.html>

From bsmith at petsc.dev  Sat Feb 13 12:15:14 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Sat, 13 Feb 2021 12:15:14 -0600
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
Message-ID: <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>


> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et> wrote:
> 
> 
> 
>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
>> 
>> Dear Jose, Dear Barry,
>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach?
>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right?
>> 
>> Regarding Barry's advice this is what I first tried:
>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>> st = es.getST()
>> ksp = st.getKSP()
>> ksp.setOperators(self.A0, self.P0)
>> 
>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right?
> 
> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1.

  Just trying to provide more clarity with the terms.

If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of  (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1

 Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations".

  Barry


> 
>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement?
> 
> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though.
> 
> Thanks,
> Pierre
> 
>> best wishes
>> Florian
>> 
>> 
>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>> 
>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
>>> 
>>> Dear Jose, Dear Matt, 
>>> 
>>> I needed some time to think about your answers. 
>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
>>> Unfortunately this interface is not available, right?
>> 
>>    If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . 
>> 
>>   Barry
>> 
>>> 
>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?
>>> 
>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
>>> 
>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?
>>> 
>>> thanks again,
>>> and best wishes
>>> Florian
>>> 
>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es <mailto:jroman at dsic.upv.es>> wrote:
>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
>>> 
>>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>> 
>>>   A0^{-1}*B0*v=theta*v
>>> 
>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>> 
>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
>>> 
>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
>>> 
>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
>>> 
>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>> 
>>> If you are using the git repo, I could add the relevant code.
>>> 
>>> Jose
>>> 
>>> 
>>> 
>>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> escribi?:
>>> > 
>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
>>> > Dear PETSc / SLEPc Users,
>>> > 
>>> > my question is very similar to the one posted here: 
>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html <https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html>
>>> > 
>>> > The eigensystem I would like to solve looks like:
>>> > B0 v = 1/omega A0 v
>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
>>> > 
>>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
>>> > 
>>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>> >         st = es.getST()
>>> >         ksp = st.getKSP()
>>> >         ksp.setOperators(self.A0, self.P0)
>>> > 
>>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
>>> > By the way, should P0 be an approximation of A0 or does it have to include B0?
>>> > 
>>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
>>> > 
>>> > Jose can correct me if I say something wrong.
>>> > 
>>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
>>> > then handed that to EPS. You can see me do it here:
>>> > 
>>> >   https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 <https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123>
>>> > 
>>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
>>> > 
>>> >   Thanks,
>>> > 
>>> >      Matt
>>> >  
>>> > thanks for any advice
>>> > best wishes
>>> > Florian
>>> > 
>>> > 
>>> > -- 
>>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> > -- Norbert Wiener
>>> > 
>>> > https://www.cse.buffalo.edu/~knepley/ <https://www.cse.buffalo.edu/~knepley/>
>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210213/febc3938/attachment.html>

From e0425375 at gmail.com  Sat Feb 13 18:43:47 2021
From: e0425375 at gmail.com (Florian Bruckner)
Date: Sun, 14 Feb 2021 01:43:47 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
	<119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
Message-ID: <CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>

Dear Barry,
thank you for your clarification. What I wanted to say is that even if I
could reset the KSP operators directly I would require to know which
transformation ST applies in order to provide the preconditioning matrix
for the correct operator.
The more general solution would be that SLEPc provides the interface to
pass the preconditioning matrix for A0 and ST applies the same
transformations as for the operator.

If you write "SLEPc could provide an interface", do you mean someone should
implement it, or should it already be possible and I am not using it
correctly?
I wrote a small standalone example based on ex9.py from slepc4py, where i
tried to use an operator.

best wishes
Florian

On Sat, Feb 13, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et> wrote:
>
>
>
> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
>
> Dear Jose, Dear Barry,
> thanks again for your reply. One final question about the B0
> orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but
> they are i*B0 orthogonal? or is there an issue with Matt's approach?
> For my problem I can show that eigenvalues fulfill an orthogonality
> relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This
> should be independent of the solving method, right?
>
> Regarding Barry's advice this is what I first tried:
> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> st = es.getST()
> ksp = st.getKSP()
> ksp.setOperators(self.A0, self.P0)
>
> But it seems that the provided P0 is not used. Furthermore the interface
> is maybe a bit confusing if ST performs some transformation. In this case
> P0 needs to approximate A0^{-1}*B0 and not A0, right?
>
>
> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift,
> which looks like it is the case, you end up with A0^-1.
>
>
>   Just trying to provide more clarity with the terms.
>
> If ST transforms the operator in the KSP to (A0-sigma B0) and you are
> providing the "sparse matrix from which the preconditioner is to be built"
> then you need to provide something that approximates (A0-sigma B0). Since
> the PC will use your matrix to construct a preconditioner that approximates
> the inverse of  (A0-sigma B0), you don't need to directly provide something
> that approximates (A0-sigma B0)^-1
>
>  Yes, I would think SLEPc could provide an interface where it manages "the
> matrix from which to construct the preconditioner" and transforms that
> matrix just like the true matrix. To do it by hand you simply need to know
> what A0 and B0 are and which sigma ST has selected and then you can
> construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and
> modB0 are your "sparser approximations".
>
>   Barry
>
>
>
> Nevertheless I think it would be the best solution if one could provide P0
> (approx A0) and SLEPc derives the preconditioner from this. Would this be
> hard to implement?
>
>
> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t
> work with your Python operator though.
>
> Thanks,
> Pierre
>
> best wishes
> Florian
>
>
> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>
>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
>>
>> Dear Jose, Dear Matt,
>>
>> I needed some time to think about your answers.
>> If I understand correctly, the eigenmode solver internally uses
>> A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP
>> solver and a corresponding preconditioner.
>> What I would need is an interface to provide not only the system Matrix
>> A0 (which is an operator), but also a preconditioning matrix (sparse
>> approximation of the operator).
>> Unfortunately this interface is not available, right?
>>
>>
>>    If SLEPc does not provide this directly it is still intended to be
>> trivial to provide the "preconditioner matrix" (that is matrix from which
>> the preconditioner is built). Just get the KSP from the ST object and use
>> KSPSetOperators() to provide the "preconditioner matrix" .
>>
>>   Barry
>>
>>
>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator
>> uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0
>> and solve a standard eigenvalue problem with this modified operator. Did I
>> understand this correctly?
>>
>> I have two further points, which I did not mention yet: the matrix B0 is
>> Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using
>> Firedrake to set up the PETSc system matrices A0, i*B0 (which is real).
>> Then I convert them into ScipyLinearOperators and use
>> scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues.
>> Minv=A0^-1 is also solving within scipy using a preconditioned gmres.
>> Advantage of this setup is that the imaginary B0 can be handled efficiently
>> and also the post-processing of the eigenvectors (which requires complex
>> arithmetics) is simplified.
>>
>> Nevertheless I think that the mixing of PETSc and Scipy looks too
>> complicated and is not very flexible.
>> If I would use Matt's approach, could I then simply switch between
>> multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to
>> the use of matshell?
>> Is there a solution for the imaginary B0, or do I have to use the
>> non-hermitian methods? Is this a large performance drawback?
>>
>> thanks again,
>> and best wishes
>> Florian
>>
>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
>>
>>> The problem can be written as A0*v=omega*B0*v and you want the
>>> eigenvalues omega closest to zero. If the matrices were explicitly
>>> available, you would do shift-and-invert with target=0, that is
>>>
>>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>>
>>>   A0^{-1}*B0*v=theta*v
>>>
>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>>
>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
>>> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
>>> EPS_SMALLEST_REAL will give slow convergence.
>>>
>>> Florian: I would not recommend setting the KSP matrices directly, it may
>>> produce strange side-effects. We should have an interface function to pass
>>> this matrix. Currently there is STPrecondSetMatForPC() but it has two
>>> problems: (1) it is intended for STPRECOND, so cannot be used with
>>> Krylov-Schur, and (2) it is not currently available in the python interface.
>>>
>>> The approach used by Matt is a workaround that does not use ST, so you
>>> can handle linear solves with a KSP of your own.
>>>
>>> As an alternative, since your problem is symmetric, you could try
>>> LOBPCG, assuming that the leftmost eigenvalues are those that you want
>>> (e.g. if all eigenvalues are non-negative). In that case you could use
>>> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>>
>>> If you are using the git repo, I could add the relevant code.
>>>
>>> Jose
>>>
>>>
>>>
>>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
>>> escribi?:
>>> >
>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com>
>>> wrote:
>>> > Dear PETSc / SLEPc Users,
>>> >
>>> > my question is very similar to the one posted here:
>>> >
>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
>>> >
>>> > The eigensystem I would like to solve looks like:
>>> > B0 v = 1/omega A0 v
>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given
>>> as a linear operator (matshell). I am looking for the largest eigenvalues
>>> (=smallest omega).
>>> >
>>> > I also have a sparse approximation P0 of the A0 operator, which i
>>> would like to use as precondtioner, using something like this:
>>> >
>>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>> >         st = es.getST()
>>> >         ksp = st.getKSP()
>>> >         ksp.setOperators(self.A0, self.P0)
>>> >
>>> > Unfortunately PETSc still complains that it cannot create a
>>> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
>>> A0.type == 'python').
>>> > By the way, should P0 be an approximation of A0 or does it have to
>>> include B0?
>>> >
>>> > Right now I am using the krylov-schur method. Are there any
>>> alternatives if A0 is only given as an operator?
>>> >
>>> > Jose can correct me if I say something wrong.
>>> >
>>> > When I did this, I made a shell operator for the action of A0^{-1} B0
>>> which has a KSPSolve() in it, so you can use your P0 preconditioning
>>> matrix, and
>>> > then handed that to EPS. You can see me do it here:
>>> >
>>> >
>>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
>>> >
>>> > I had a hard time getting the embedded solver to work the way I
>>> wanted, but maybe that is the better way.
>>> >
>>> >   Thanks,
>>> >
>>> >      Matt
>>> >
>>> > thanks for any advice
>>> > best wishes
>>> > Florian
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>> >
>>> > https://www.cse.buffalo.edu/~knepley/
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210214/a9715c96/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: text/x-python
Size: 2509 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210214/a9715c96/attachment-0001.py>

From bsmith at petsc.dev  Sun Feb 14 14:41:39 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Sun, 14 Feb 2021 14:41:39 -0600
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
	<119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
	<CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>
Message-ID: <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev>


  Florian,

   I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). 

   All I was saying is theoretically there could/should be such support in SLEPc.

  Barry


> On Feb 13, 2021, at 6:43 PM, Florian Bruckner <e0425375 at gmail.com> wrote:
> 
> Dear Barry,
> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator.
> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator.
> 
> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly?
> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator.
> 
> best wishes
> Florian
> 
> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> 
> 
>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>> wrote:
>> 
>> 
>> 
>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
>>> 
>>> Dear Jose, Dear Barry,
>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach?
>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right?
>>> 
>>> Regarding Barry's advice this is what I first tried:
>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>> st = es.getST()
>>> ksp = st.getKSP()
>>> ksp.setOperators(self.A0, self.P0)
>>> 
>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right?
>> 
>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1.
> 
>   Just trying to provide more clarity with the terms.
> 
> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of  (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1
> 
>  Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations".
> 
>   Barry
> 
> 
>> 
>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement?
>> 
>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though.
>> 
>> Thanks,
>> Pierre
>> 
>>> best wishes
>>> Florian
>>> 
>>> 
>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>> 
>>> 
>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
>>>> 
>>>> Dear Jose, Dear Matt, 
>>>> 
>>>> I needed some time to think about your answers. 
>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
>>>> Unfortunately this interface is not available, right?
>>> 
>>>    If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . 
>>> 
>>>   Barry
>>> 
>>>> 
>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?
>>>> 
>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
>>>> 
>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?
>>>> 
>>>> thanks again,
>>>> and best wishes
>>>> Florian
>>>> 
>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es <mailto:jroman at dsic.upv.es>> wrote:
>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
>>>> 
>>>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>>> 
>>>>   A0^{-1}*B0*v=theta*v
>>>> 
>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>>> 
>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
>>>> 
>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
>>>> 
>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
>>>> 
>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>>> 
>>>> If you are using the git repo, I could add the relevant code.
>>>> 
>>>> Jose
>>>> 
>>>> 
>>>> 
>>>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> escribi?:
>>>> > 
>>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com <mailto:e0425375 at gmail.com>> wrote:
>>>> > Dear PETSc / SLEPc Users,
>>>> > 
>>>> > my question is very similar to the one posted here: 
>>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html <https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html>
>>>> > 
>>>> > The eigensystem I would like to solve looks like:
>>>> > B0 v = 1/omega A0 v
>>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
>>>> > 
>>>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
>>>> > 
>>>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>>> >         st = es.getST()
>>>> >         ksp = st.getKSP()
>>>> >         ksp.setOperators(self.A0, self.P0)
>>>> > 
>>>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
>>>> > By the way, should P0 be an approximation of A0 or does it have to include B0?
>>>> > 
>>>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
>>>> > 
>>>> > Jose can correct me if I say something wrong.
>>>> > 
>>>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
>>>> > then handed that to EPS. You can see me do it here:
>>>> > 
>>>> >   https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 <https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123>
>>>> > 
>>>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
>>>> > 
>>>> >   Thanks,
>>>> > 
>>>> >      Matt
>>>> >  
>>>> > thanks for any advice
>>>> > best wishes
>>>> > Florian
>>>> > 
>>>> > 
>>>> > -- 
>>>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> > -- Norbert Wiener
>>>> > 
>>>> > https://www.cse.buffalo.edu/~knepley/ <https://www.cse.buffalo.edu/~knepley/>
>>>> 
>>> 
>> 
> 
> <test.py>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210214/10ae2143/attachment.html>

From jroman at dsic.upv.es  Mon Feb 15 06:27:00 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Mon, 15 Feb 2021 13:27:00 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
	<119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
	<CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>
	<6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev>
Message-ID: <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es>

I will think about the viability of adding an interface function to pass the preconditioner matrix.

Regarding the question about the B-orthogonality of computed vectors, in the symmetric solver the B-orthogonality is enforced during the computation, so you have guarantee that the computed vectors satisfy it. But if solved as non-symetric, the computed vectors may depart from B-orthogonality, unless the tolerance is very small.

Jose


> El 14 feb 2021, a las 21:41, Barry Smith <bsmith at petsc.dev> escribi?:
> 
> 
>   Florian,
> 
>    I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). 
> 
>    All I was saying is theoretically there could/should be such support in SLEPc.
> 
>   Barry
> 
> 
>> On Feb 13, 2021, at 6:43 PM, Florian Bruckner <e0425375 at gmail.com> wrote:
>> 
>> Dear Barry,
>> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator.
>> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator.
>> 
>> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly?
>> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator.
>> 
>> best wishes
>> Florian
>> 
>> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev> wrote:
>> 
>> 
>>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et> wrote:
>>> 
>>> 
>>> 
>>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
>>>> 
>>>> Dear Jose, Dear Barry,
>>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach?
>>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right?
>>>> 
>>>> Regarding Barry's advice this is what I first tried:
>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>>> st = es.getST()
>>>> ksp = st.getKSP()
>>>> ksp.setOperators(self.A0, self.P0)
>>>> 
>>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right?
>>> 
>>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1.
>> 
>>   Just trying to provide more clarity with the terms.
>> 
>> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of  (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1
>> 
>>  Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations".
>> 
>>   Barry
>> 
>> 
>>> 
>>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement?
>>> 
>>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though.
>>> 
>>> Thanks,
>>> Pierre
>>> 
>>>> best wishes
>>>> Florian
>>>> 
>>>> 
>>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev> wrote:
>>>> 
>>>> 
>>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
>>>>> 
>>>>> Dear Jose, Dear Matt, 
>>>>> 
>>>>> I needed some time to think about your answers. 
>>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
>>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
>>>>> Unfortunately this interface is not available, right?
>>>> 
>>>>    If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . 
>>>> 
>>>>   Barry
>>>> 
>>>>> 
>>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?
>>>>> 
>>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
>>>>> 
>>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
>>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
>>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?
>>>>> 
>>>>> thanks again,
>>>>> and best wishes
>>>>> Florian
>>>>> 
>>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
>>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
>>>>> 
>>>>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>>>> 
>>>>>   A0^{-1}*B0*v=theta*v
>>>>> 
>>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>>>> 
>>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
>>>>> 
>>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
>>>>> 
>>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
>>>>> 
>>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>>>> 
>>>>> If you are using the git repo, I could add the relevant code.
>>>>> 
>>>>> Jose
>>>>> 
>>>>> 
>>>>> 
>>>>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com> escribi?:
>>>>> > 
>>>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com> wrote:
>>>>> > Dear PETSc / SLEPc Users,
>>>>> > 
>>>>> > my question is very similar to the one posted here: 
>>>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
>>>>> > 
>>>>> > The eigensystem I would like to solve looks like:
>>>>> > B0 v = 1/omega A0 v
>>>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
>>>>> > 
>>>>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
>>>>> > 
>>>>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>>>> >         st = es.getST()
>>>>> >         ksp = st.getKSP()
>>>>> >         ksp.setOperators(self.A0, self.P0)
>>>>> > 
>>>>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
>>>>> > By the way, should P0 be an approximation of A0 or does it have to include B0?
>>>>> > 
>>>>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
>>>>> > 
>>>>> > Jose can correct me if I say something wrong.
>>>>> > 
>>>>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
>>>>> > then handed that to EPS. You can see me do it here:
>>>>> > 
>>>>> >   https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
>>>>> > 
>>>>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
>>>>> > 
>>>>> >   Thanks,
>>>>> > 
>>>>> >      Matt
>>>>> >  
>>>>> > thanks for any advice
>>>>> > best wishes
>>>>> > Florian
>>>>> > 
>>>>> > 
>>>>> > -- 
>>>>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>> > -- Norbert Wiener
>>>>> > 
>>>>> > https://www.cse.buffalo.edu/~knepley/
>>>>> 
>>>> 
>>> 
>> 
>> <test.py>
> 


From knepley at gmail.com  Mon Feb 15 07:53:49 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 15 Feb 2021 08:53:49 -0500
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
	<119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
	<CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>
	<6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev>
	<46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es>
Message-ID: <CAMYG4G=nN79akOKMXYS-YXesTj1pZq-br6C_4GA29Vm0zFAwGQ@mail.gmail.com>

On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman <jroman at dsic.upv.es> wrote:

> I will think about the viability of adding an interface function to pass
> the preconditioner matrix.
>
> Regarding the question about the B-orthogonality of computed vectors, in
> the symmetric solver the B-orthogonality is enforced during the
> computation, so you have guarantee that the computed vectors satisfy it.
> But if solved as non-symetric, the computed vectors may depart from
> B-orthogonality, unless the tolerance is very small.
>

Yes, the vectors I generate are not B-orthogonal.

Jose, do you think there is a way to reformulate what I am doing to use the
symmetric solver, even if we only have the action of B?

  Thanks,

     Matt


> Jose
>
>
> > El 14 feb 2021, a las 21:41, Barry Smith <bsmith at petsc.dev> escribi?:
> >
> >
> >   Florian,
> >
> >    I'm sorry I don't know the answers; I can only speculate. There is a
> STGetShift().
> >
> >    All I was saying is theoretically there could/should be such support
> in SLEPc.
> >
> >   Barry
> >
> >
> >> On Feb 13, 2021, at 6:43 PM, Florian Bruckner <e0425375 at gmail.com>
> wrote:
> >>
> >> Dear Barry,
> >> thank you for your clarification. What I wanted to say is that even if
> I could reset the KSP operators directly I would require to know which
> transformation ST applies in order to provide the preconditioning matrix
> for the correct operator.
> >> The more general solution would be that SLEPc provides the interface to
> pass the preconditioning matrix for A0 and ST applies the same
> transformations as for the operator.
> >>
> >> If you write "SLEPc could provide an interface", do you mean someone
> should implement it, or should it already be possible and I am not using it
> correctly?
> >> I wrote a small standalone example based on ex9.py from slepc4py, where
> i tried to use an operator.
> >>
> >> best wishes
> >> Florian
> >>
> >> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev> wrote:
> >>
> >>
> >>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et> wrote:
> >>>
> >>>
> >>>
> >>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com>
> wrote:
> >>>>
> >>>> Dear Jose, Dear Barry,
> >>>> thanks again for your reply. One final question about the B0
> orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but
> they are i*B0 orthogonal? or is there an issue with Matt's approach?
> >>>> For my problem I can show that eigenvalues fulfill an orthogonality
> relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This
> should be independent of the solving method, right?
> >>>>
> >>>> Regarding Barry's advice this is what I first tried:
> >>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >>>> st = es.getST()
> >>>> ksp = st.getKSP()
> >>>> ksp.setOperators(self.A0, self.P0)
> >>>>
> >>>> But it seems that the provided P0 is not used. Furthermore the
> interface is maybe a bit confusing if ST performs some transformation. In
> this case P0 needs to approximate A0^{-1}*B0 and not A0, right?
> >>>
> >>> No, you need to approximate (A0-sigma B0)^-1. If you have a null
> shift, which looks like it is the case, you end up with A0^-1.
> >>
> >>   Just trying to provide more clarity with the terms.
> >>
> >> If ST transforms the operator in the KSP to (A0-sigma B0) and you are
> providing the "sparse matrix from which the preconditioner is to be built"
> then you need to provide something that approximates (A0-sigma B0). Since
> the PC will use your matrix to construct a preconditioner that approximates
> the inverse of  (A0-sigma B0), you don't need to directly provide something
> that approximates (A0-sigma B0)^-1
> >>
> >>  Yes, I would think SLEPc could provide an interface where it manages
> "the matrix from which to construct the preconditioner" and transforms that
> matrix just like the true matrix. To do it by hand you simply need to know
> what A0 and B0 are and which sigma ST has selected and then you can
> construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and
> modB0 are your "sparser approximations".
> >>
> >>   Barry
> >>
> >>
> >>>
> >>>> Nevertheless I think it would be the best solution if one could
> provide P0 (approx A0) and SLEPc derives the preconditioner from this.
> Would this be hard to implement?
> >>>
> >>> This is what Barry?s suggestion is implementing. Don?t know why it
> doesn?t work with your Python operator though.
> >>>
> >>> Thanks,
> >>> Pierre
> >>>
> >>>> best wishes
> >>>> Florian
> >>>>
> >>>>
> >>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev> wrote:
> >>>>
> >>>>
> >>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com>
> wrote:
> >>>>>
> >>>>> Dear Jose, Dear Matt,
> >>>>>
> >>>>> I needed some time to think about your answers.
> >>>>> If I understand correctly, the eigenmode solver internally uses
> A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP
> solver and a corresponding preconditioner.
> >>>>> What I would need is an interface to provide not only the system
> Matrix A0 (which is an operator), but also a preconditioning matrix (sparse
> approximation of the operator).
> >>>>> Unfortunately this interface is not available, right?
> >>>>
> >>>>    If SLEPc does not provide this directly it is still intended to be
> trivial to provide the "preconditioner matrix" (that is matrix from which
> the preconditioner is built). Just get the KSP from the ST object and use
> KSPSetOperators() to provide the "preconditioner matrix" .
> >>>>
> >>>>   Barry
> >>>>
> >>>>>
> >>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The
> operator uses a KSP with a proper PC internally. SLEPc would directly get
> A0^{-1}*B0 and solve a standard eigenvalue problem with this modified
> operator. Did I understand this correctly?
> >>>>>
> >>>>> I have two further points, which I did not mention yet: the matrix
> B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am
> using Firedrake to set up the PETSc system matrices A0, i*B0 (which is
> real). Then I convert them into ScipyLinearOperators and use
> scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues.
> Minv=A0^-1 is also solving within scipy using a preconditioned gmres.
> Advantage of this setup is that the imaginary B0 can be handled efficiently
> and also the post-processing of the eigenvectors (which requires complex
> arithmetics) is simplified.
> >>>>>
> >>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too
> complicated and is not very flexible.
> >>>>> If I would use Matt's approach, could I then simply switch between
> multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to
> the use of matshell?
> >>>>> Is there a solution for the imaginary B0, or do I have to use the
> non-hermitian methods? Is this a large performance drawback?
> >>>>>
> >>>>> thanks again,
> >>>>> and best wishes
> >>>>> Florian
> >>>>>
> >>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es>
> wrote:
> >>>>> The problem can be written as A0*v=omega*B0*v and you want the
> eigenvalues omega closest to zero. If the matrices were explicitly
> available, you would do shift-and-invert with target=0, that is
> >>>>>
> >>>>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
> >>>>>
> >>>>>   A0^{-1}*B0*v=theta*v
> >>>>>
> >>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
> >>>>>
> >>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
> EPS_SMALLEST_REAL will give slow convergence.
> >>>>>
> >>>>> Florian: I would not recommend setting the KSP matrices directly, it
> may produce strange side-effects. We should have an interface function to
> pass this matrix. Currently there is STPrecondSetMatForPC() but it has two
> problems: (1) it is intended for STPRECOND, so cannot be used with
> Krylov-Schur, and (2) it is not currently available in the python interface.
> >>>>>
> >>>>> The approach used by Matt is a workaround that does not use ST, so
> you can handle linear solves with a KSP of your own.
> >>>>>
> >>>>> As an alternative, since your problem is symmetric, you could try
> LOBPCG, assuming that the leftmost eigenvalues are those that you want
> (e.g. if all eigenvalues are non-negative). In that case you could use
> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
> >>>>>
> >>>>> If you are using the git repo, I could add the relevant code.
> >>>>>
> >>>>> Jose
> >>>>>
> >>>>>
> >>>>>
> >>>>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
> escribi?:
> >>>>> >
> >>>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <
> e0425375 at gmail.com> wrote:
> >>>>> > Dear PETSc / SLEPc Users,
> >>>>> >
> >>>>> > my question is very similar to the one posted here:
> >>>>> >
> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
> >>>>> >
> >>>>> > The eigensystem I would like to solve looks like:
> >>>>> > B0 v = 1/omega A0 v
> >>>>> > B0 and A0 are both hermitian, A0 is positive definite, but only
> given as a linear operator (matshell). I am looking for the largest
> eigenvalues (=smallest omega).
> >>>>> >
> >>>>> > I also have a sparse approximation P0 of the A0 operator, which i
> would like to use as precondtioner, using something like this:
> >>>>> >
> >>>>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >>>>> >         st = es.getST()
> >>>>> >         ksp = st.getKSP()
> >>>>> >         ksp.setOperators(self.A0, self.P0)
> >>>>> >
> >>>>> > Unfortunately PETSc still complains that it cannot create a
> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
> A0.type == 'python').
> >>>>> > By the way, should P0 be an approximation of A0 or does it have to
> include B0?
> >>>>> >
> >>>>> > Right now I am using the krylov-schur method. Are there any
> alternatives if A0 is only given as an operator?
> >>>>> >
> >>>>> > Jose can correct me if I say something wrong.
> >>>>> >
> >>>>> > When I did this, I made a shell operator for the action of A0^{-1}
> B0 which has a KSPSolve() in it, so you can use your P0 preconditioning
> matrix, and
> >>>>> > then handed that to EPS. You can see me do it here:
> >>>>> >
> >>>>> >
> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
> >>>>> >
> >>>>> > I had a hard time getting the embedded solver to work the way I
> wanted, but maybe that is the better way.
> >>>>> >
> >>>>> >   Thanks,
> >>>>> >
> >>>>> >      Matt
> >>>>> >
> >>>>> > thanks for any advice
> >>>>> > best wishes
> >>>>> > Florian
> >>>>> >
> >>>>> >
> >>>>> > --
> >>>>> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> >>>>> > -- Norbert Wiener
> >>>>> >
> >>>>> > https://www.cse.buffalo.edu/~knepley/
> >>>>>
> >>>>
> >>>
> >>
> >> <test.py>
> >
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210215/03201146/attachment.html>

From jroman at dsic.upv.es  Mon Feb 15 11:44:15 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Mon, 15 Feb 2021 18:44:15 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAMYG4G=nN79akOKMXYS-YXesTj1pZq-br6C_4GA29Vm0zFAwGQ@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
	<119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
	<CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>
	<6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev>
	<46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es>
	<CAMYG4G=nN79akOKMXYS-YXesTj1pZq-br6C_4GA29Vm0zFAwGQ@mail.gmail.com>
Message-ID: <80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es>


> El 15 feb 2021, a las 14:53, Matthew Knepley <knepley at gmail.com> escribi?:
> 
> On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
> I will think about the viability of adding an interface function to pass the preconditioner matrix.
> 
> Regarding the question about the B-orthogonality of computed vectors, in the symmetric solver the B-orthogonality is enforced during the computation, so you have guarantee that the computed vectors satisfy it. But if solved as non-symetric, the computed vectors may depart from B-orthogonality, unless the tolerance is very small.
> 
> Yes, the vectors I generate are not B-orthogonal.
> 
> Jose, do you think there is a way to reformulate what I am doing to use the symmetric solver, even if we only have the action of B?

Yes, you can do the following:

  ierr = EPSSetOperators(eps,S,NULL);CHKERRQ(ierr);   // S is your shell matrix A^{-1}*B
  ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr);  // symmetric problem though S is not symmetric
  ierr = EPSSetFromOptions(eps);CHKERRQ(ierr);
  ierr = EPSSetUp(eps);CHKERRQ(ierr);   // note explicitly calling setup here
  ierr = EPSGetBV(eps,&bv);CHKERRQ(ierr);
  ierr = BVSetMatrix(bv,B,PETSC_FALSE);CHKERRQ(ierr);    // replace solver's inner product
  ierr = EPSSolve(eps);CHKERRQ(ierr);

I have tried this with test1.c and it works. The computed eigenvectors should be B-orthogonal in this case.

Jose


> 
>   Thanks,
> 
>      Matt
>  
> Jose
> 
> 
> > El 14 feb 2021, a las 21:41, Barry Smith <bsmith at petsc.dev> escribi?:
> > 
> > 
> >   Florian,
> > 
> >    I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). 
> > 
> >    All I was saying is theoretically there could/should be such support in SLEPc.
> > 
> >   Barry
> > 
> > 
> >> On Feb 13, 2021, at 6:43 PM, Florian Bruckner <e0425375 at gmail.com> wrote:
> >> 
> >> Dear Barry,
> >> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator.
> >> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator.
> >> 
> >> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly?
> >> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator.
> >> 
> >> best wishes
> >> Florian
> >> 
> >> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev> wrote:
> >> 
> >> 
> >>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et> wrote:
> >>> 
> >>> 
> >>> 
> >>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
> >>>> 
> >>>> Dear Jose, Dear Barry,
> >>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach?
> >>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right?
> >>>> 
> >>>> Regarding Barry's advice this is what I first tried:
> >>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >>>> st = es.getST()
> >>>> ksp = st.getKSP()
> >>>> ksp.setOperators(self.A0, self.P0)
> >>>> 
> >>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right?
> >>> 
> >>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1.
> >> 
> >>   Just trying to provide more clarity with the terms.
> >> 
> >> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of  (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1
> >> 
> >>  Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations".
> >> 
> >>   Barry
> >> 
> >> 
> >>> 
> >>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement?
> >>> 
> >>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though.
> >>> 
> >>> Thanks,
> >>> Pierre
> >>> 
> >>>> best wishes
> >>>> Florian
> >>>> 
> >>>> 
> >>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev> wrote:
> >>>> 
> >>>> 
> >>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
> >>>>> 
> >>>>> Dear Jose, Dear Matt, 
> >>>>> 
> >>>>> I needed some time to think about your answers. 
> >>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
> >>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
> >>>>> Unfortunately this interface is not available, right?
> >>>> 
> >>>>    If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . 
> >>>> 
> >>>>   Barry
> >>>> 
> >>>>> 
> >>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?
> >>>>> 
> >>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
> >>>>> 
> >>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
> >>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
> >>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?
> >>>>> 
> >>>>> thanks again,
> >>>>> and best wishes
> >>>>> Florian
> >>>>> 
> >>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
> >>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
> >>>>> 
> >>>>>   (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
> >>>>> 
> >>>>>   A0^{-1}*B0*v=theta*v
> >>>>> 
> >>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
> >>>>> 
> >>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
> >>>>> 
> >>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
> >>>>> 
> >>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
> >>>>> 
> >>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
> >>>>> 
> >>>>> If you are using the git repo, I could add the relevant code.
> >>>>> 
> >>>>> Jose
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> > El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com> escribi?:
> >>>>> > 
> >>>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com> wrote:
> >>>>> > Dear PETSc / SLEPc Users,
> >>>>> > 
> >>>>> > my question is very similar to the one posted here: 
> >>>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
> >>>>> > 
> >>>>> > The eigensystem I would like to solve looks like:
> >>>>> > B0 v = 1/omega A0 v
> >>>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
> >>>>> > 
> >>>>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
> >>>>> > 
> >>>>> >         es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >>>>> >         st = es.getST()
> >>>>> >         ksp = st.getKSP()
> >>>>> >         ksp.setOperators(self.A0, self.P0)
> >>>>> > 
> >>>>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
> >>>>> > By the way, should P0 be an approximation of A0 or does it have to include B0?
> >>>>> > 
> >>>>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
> >>>>> > 
> >>>>> > Jose can correct me if I say something wrong.
> >>>>> > 
> >>>>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
> >>>>> > then handed that to EPS. You can see me do it here:
> >>>>> > 
> >>>>> >   https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
> >>>>> > 
> >>>>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
> >>>>> > 
> >>>>> >   Thanks,
> >>>>> > 
> >>>>> >      Matt
> >>>>> >  
> >>>>> > thanks for any advice
> >>>>> > best wishes
> >>>>> > Florian
> >>>>> > 
> >>>>> > 
> >>>>> > -- 
> >>>>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> >>>>> > -- Norbert Wiener
> >>>>> > 
> >>>>> > https://www.cse.buffalo.edu/~knepley/
> >>>>> 
> >>>> 
> >>> 
> >> 
> >> <test.py>
> > 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/


From swarnava89 at gmail.com  Mon Feb 15 19:47:29 2021
From: swarnava89 at gmail.com (Swarnava Ghosh)
Date: Mon, 15 Feb 2021 20:47:29 -0500
Subject: [petsc-users] makefile for building application with petsc
Message-ID: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>

Dear Petsc developers and users,

I am having some issue with building my code with the following makefile. I
was earlier able to build this with the same makefile on a different
machine. Would you please help me out on this issue?

Contents of makefile:
==============================================
all:sparc

CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
-llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc
./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc
./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc
./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc
./src/electrostatics.cc ./src/tools.cc

SOURCEH = ./inc/sddft.h ./inc/isddft.h

OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o
./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o

LIBBASE = ./lib/sparc

CLEANFILES = ./lib/sparc

include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules

sparc: ${OBJSC} chkopts
${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
${RM} $(SOURCECPP:%.cc=%.o)

===========================================
Error:
/home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g
-I/home/swarnava/petsc/include
-I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
/home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error:
cannot open source file "sddft.h"
  #include "sddft.h"
                    ^
====================================================

It's not able to see the header file though I have -I ./inc in  CPPFLAGS.
The directory containing makefile has the directory "inc" with the headers
and "src" with the .cc files.

Thank you,
Swarnava
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210215/4d2ff924/attachment.html>

From jacob.fai at gmail.com  Mon Feb 15 19:57:31 2021
From: jacob.fai at gmail.com (Jacob Faibussowitsch)
Date: Mon, 15 Feb 2021 20:57:31 -0500
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
Message-ID: <AF69D127-9195-4E56-81C1-402B966E9176@gmail.com>

Hello,

If possible, can you include a copy of your configure.log which you used to configure petsc? It will give useful information about your machine and compilers. What system did this work with successfully? Also please attach the makefile directly rather than including its contents as text, it is much easier to read.

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)
Cell: (312) 694-3391

> On Feb 15, 2021, at 20:47, Swarnava Ghosh <swarnava89 at gmail.com> wrote:
> 
> Dear Petsc developers and users,
> 
> I am having some issue with building my code with the following makefile. I was earlier able to build this with the same makefile on a different machine. Would you please help me out on this issue?
> 
> Contents of makefile:
> ==============================================
> all:sparc
> 
> CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> 
> SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/tools.cc
> 
> SOURCEH = ./inc/sddft.h ./inc/isddft.h
> 
> OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
> 
> LIBBASE = ./lib/sparc
> 
> CLEANFILES = ./lib/sparc 
> 
> include ${PETSC_DIR}/lib/petsc/conf/variables
> include ${PETSC_DIR}/lib/petsc/conf/rules
> 
> sparc: ${OBJSC} chkopts
> ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> ${RM} $(SOURCECPP:%.cc=%.o)
> 
> ===========================================
> Error:
> /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g     -I/home/swarnava/petsc/include -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: cannot open source file "sddft.h"
>   #include "sddft.h"
>                     ^
> ====================================================
> 
> It's not able to see the header file though I have -I ./inc in  CPPFLAGS. The directory containing makefile has the directory "inc" with the headers and "src" with the .cc files.
> 
> Thank you,
> Swarnava
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210215/71eda31a/attachment.html>

From swarnava89 at gmail.com  Mon Feb 15 20:13:21 2021
From: swarnava89 at gmail.com (Swarnava Ghosh)
Date: Mon, 15 Feb 2021 21:13:21 -0500
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <AF69D127-9195-4E56-81C1-402B966E9176@gmail.com>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
	<AF69D127-9195-4E56-81C1-402B966E9176@gmail.com>
Message-ID: <CAC9YzR67OucyX-80D_yCc3MmG5bmaKsd+0XKF29_ZJS72D1rAQ@mail.gmail.com>

Hi Jacob,

Attached is the configure.log and the makefile. It worked on a computing
cluster earlier. The petsc and other necessary modules were built by system
administrator on the cluster. I am trying to build the same code on a
workstation.

Sincerely,
Swarnava

On Mon, Feb 15, 2021 at 8:57 PM Jacob Faibussowitsch <jacob.fai at gmail.com>
wrote:

> Hello,
>
> If possible, can you include a copy of your configure.log which you used
> to configure petsc? It will give useful information about your machine and
> compilers. What system did this work with successfully? Also please attach
> the makefile directly rather than including its contents as text, it is
> much easier to read.
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>
> On Feb 15, 2021, at 20:47, Swarnava Ghosh <swarnava89 at gmail.com> wrote:
>
> Dear Petsc developers and users,
>
> I am having some issue with building my code with the following makefile.
> I was earlier able to build this with the same makefile on a different
> machine. Would you please help me out on this issue?
>
> Contents of makefile:
> ==============================================
> all:sparc
>
> CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
>
> SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/
> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/
> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc
> ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/
> tools.cc
>
> SOURCEH = ./inc/sddft.h ./inc/isddft.h
>
> OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o
> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
>
> LIBBASE = ./lib/sparc
>
> CLEANFILES = ./lib/sparc
>
> include ${PETSC_DIR}/lib/petsc/conf/variables
> include ${PETSC_DIR}/lib/petsc/conf/rules
>
> sparc: ${OBJSC} chkopts
> ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> ${RM} $(SOURCECPP:%.cc=%.o)
>
> ===========================================
> Error:
> /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g
> -I/home/swarnava/petsc/include
> -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error:
> cannot open source file "sddft.h"
>   #include "sddft.h"
>                     ^
> ====================================================
>
> It's not able to see the header file though I have -I ./inc in  CPPFLAGS.
> The directory containing makefile has the directory "inc" with the headers
> and "src" with the .cc files.
>
> Thank you,
> Swarnava
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210215/14406996/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile
Type: application/octet-stream
Size: 986 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210215/14406996/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: text/x-log
Size: 3693065 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210215/14406996/attachment-0001.bin>

From knepley at gmail.com  Mon Feb 15 21:13:43 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 15 Feb 2021 22:13:43 -0500
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
Message-ID: <CAMYG4Gk_ckdqYQ9SEzy+b9-Q3S5=uotKcTrxwtv5_E3iRrKLxA@mail.gmail.com>

On Mon, Feb 15, 2021 at 8:47 PM Swarnava Ghosh <swarnava89 at gmail.com> wrote:

> Dear Petsc developers and users,
>
> I am having some issue with building my code with the following makefile.
> I was earlier able to build this with the same makefile on a different
> machine. Would you please help me out on this issue?
>
> Contents of makefile:
> ==============================================
> all:sparc
>
> CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
>
> SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc
> ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc
> ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc
> ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc
> ./src/electrostatics.cc ./src/tools.cc
>
> SOURCEH = ./inc/sddft.h ./inc/isddft.h
>
> OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o
> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
>
> LIBBASE = ./lib/sparc
>
> CLEANFILES = ./lib/sparc
>
> include ${PETSC_DIR}/lib/petsc/conf/variables
> include ${PETSC_DIR}/lib/petsc/conf/rules
>
> sparc: ${OBJSC} chkopts
> ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> ${RM} $(SOURCECPP:%.cc=%.o)
>
> ===========================================
> Error:
> /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g
> -I/home/swarnava/petsc/include
> -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error:
> cannot open source file "sddft.h"
>   #include "sddft.h"
>                     ^
> ====================================================
>
> It's not able to see the header file though I have -I ./inc in  CPPFLAGS.
> The directory containing makefile has the directory "inc" with the headers
> and "src" with the .cc files.
>

Some things have been reorganized with make. Can you try using CCPPFLAGS
instead?

  Thanks,

     Matt


> Thank you,
> Swarnava
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210215/c493a9f5/attachment.html>

From swarnava89 at gmail.com  Mon Feb 15 21:39:27 2021
From: swarnava89 at gmail.com (Swarnava Ghosh)
Date: Mon, 15 Feb 2021 22:39:27 -0500
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <CAMYG4Gk_ckdqYQ9SEzy+b9-Q3S5=uotKcTrxwtv5_E3iRrKLxA@mail.gmail.com>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
	<CAMYG4Gk_ckdqYQ9SEzy+b9-Q3S5=uotKcTrxwtv5_E3iRrKLxA@mail.gmail.com>
Message-ID: <CAC9YzR7CgFJgjL4Wp-yca6mtpqmnotLxhr_VM0ew0vBm0=Astw@mail.gmail.com>

Hi Matthew,

Tried CCPPFLAGS. It did not work.

Sincerely,
Swarnava

On Mon, Feb 15, 2021 at 10:13 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Feb 15, 2021 at 8:47 PM Swarnava Ghosh <swarnava89 at gmail.com>
> wrote:
>
>> Dear Petsc developers and users,
>>
>> I am having some issue with building my code with the following makefile.
>> I was earlier able to build this with the same makefile on a different
>> machine. Would you please help me out on this issue?
>>
>> Contents of makefile:
>> ==============================================
>> all:sparc
>>
>> CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
>> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
>>
>> SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc
>> ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc
>> ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc
>> ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc
>> ./src/electrostatics.cc ./src/tools.cc
>>
>> SOURCEH = ./inc/sddft.h ./inc/isddft.h
>>
>> OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o
>> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
>> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
>> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
>>
>> LIBBASE = ./lib/sparc
>>
>> CLEANFILES = ./lib/sparc
>>
>> include ${PETSC_DIR}/lib/petsc/conf/variables
>> include ${PETSC_DIR}/lib/petsc/conf/rules
>>
>> sparc: ${OBJSC} chkopts
>> ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
>> ${RM} $(SOURCECPP:%.cc=%.o)
>>
>> ===========================================
>> Error:
>> /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g
>> -I/home/swarnava/petsc/include
>> -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
>> /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error:
>> cannot open source file "sddft.h"
>>   #include "sddft.h"
>>                     ^
>> ====================================================
>>
>> It's not able to see the header file though I have -I ./inc in  CPPFLAGS.
>> The directory containing makefile has the directory "inc" with the headers
>> and "src" with the .cc files.
>>
>
> Some things have been reorganized with make. Can you try using CCPPFLAGS
> instead?
>
>   Thanks,
>
>      Matt
>
>
>> Thank you,
>> Swarnava
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210215/65e154f9/attachment.html>

From bsmith at petsc.dev  Mon Feb 15 21:50:33 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 15 Feb 2021 21:50:33 -0600
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
Message-ID: <BB87E0DC-5F29-4893-87BA-9FA9A1F9E19B@petsc.dev>

Swarnava,

sddft.h is not a PETSc include file, nor is it used by PETSc so I think the issue is not directly to PETSc it is related to where sddft is on the machine and how it is found by your makefile.

  Barry


> On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh <swarnava89 at gmail.com> wrote:
> 
> Dear Petsc developers and users,
> 
> I am having some issue with building my code with the following makefile. I was earlier able to build this with the same makefile on a different machine. Would you please help me out on this issue?
> 
> Contents of makefile:
> ==============================================
> all:sparc
> 
> CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> 
> SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/tools.cc
> 
> SOURCEH = ./inc/sddft.h ./inc/isddft.h
> 
> OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
> 
> LIBBASE = ./lib/sparc
> 
> CLEANFILES = ./lib/sparc 
> 
> include ${PETSC_DIR}/lib/petsc/conf/variables
> include ${PETSC_DIR}/lib/petsc/conf/rules
> 
> sparc: ${OBJSC} chkopts
> ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> ${RM} $(SOURCECPP:%.cc=%.o)
> 
> ===========================================
> Error:
> /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g     -I/home/swarnava/petsc/include -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: cannot open source file "sddft.h"
>   #include "sddft.h"
>                     ^
> ====================================================
> 
> It's not able to see the header file though I have -I ./inc in  CPPFLAGS. The directory containing makefile has the directory "inc" with the headers and "src" with the .cc files.
> 
> Thank you,
> Swarnava
> 
> 


From roland.richter at ntnu.no  Tue Feb 16 02:43:01 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Tue, 16 Feb 2021 09:43:01 +0100
Subject: [petsc-users] Using distributed dense matrix/vector operations on a
 GPU
Message-ID: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>

Hei,

after profiling my program using -log_view, I got the following output
(all matrices are dense):

/Using 8 OpenMP threads//
//Using Petsc Development GIT revision: v3.14.3-583-g5464005aea? GIT
Date: 2021-01-25 16:01:41 -0600//
//
//???????????????????????? Max?????? Max/Min???? Avg?????? Total//
//Time (sec):?????????? 5.074e+03???? 1.000?? 5.074e+03//
//Objects:????????????? 2.158e+03???? 1.000?? 2.158e+03//
//Flop:???????????????? 5.236e+13???? 1.000?? 5.236e+13? 5.236e+13//
//Flop/sec:???????????? 1.032e+10???? 1.000?? 1.032e+10? 1.032e+10//
//MPI Messages:???????? 0.000e+00???? 0.000?? 0.000e+00? 0.000e+00//
//MPI Message Lengths:? 0.000e+00???? 0.000?? 0.000e+00? 0.000e+00//
//MPI Reductions:?????? 0.000e+00???? 0.000//
//
//Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)//
//??????????????????????????? e.g., VecAXPY() for real vectors of length
N --> 2N flop//
//??????????????????????????? and VecAXPY() for complex vectors of
length N --> 8N flop//
//
//Summary of Stages:?? ----- Time ------? ----- Flop ------? ---
Messages ---? -- Message Lengths --? -- Reductions --//
//??????????????????????? Avg???? %Total???? Avg???? %Total??? Count??
%Total???? Avg???????? %Total??? Count?? %Total//
//?0:????? Main Stage: 5.0744e+03 100.0%? 5.2359e+13 100.0%? 0.000e+00??
0.0%? 0.000e+00??????? 0.0%? 0.000e+00?? 0.0%//
//
//------------------------------------------------------------------------------------------------------------------------//
//See the 'Profiling' chapter of the users' manual for details on
interpreting output.//
//Phase summary info://
//?? Count: number of times phase was executed//
//?? Time and Flop: Max - maximum over all processors//
//????????????????? Ratio - ratio of maximum to minimum over all
processors//
//?? Mess: number of messages sent//
//?? AvgLen: average message length (bytes)//
//?? Reduct: number of global reductions//
//?? Global: entire computation//
//?? Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().//
//????? %T - percent time in this phase???????? %F - percent flop in
this phase//
//????? %M - percent messages in this phase???? %L - percent message
lengths in this phase//
//????? %R - percent reductions in this phase//
//?? Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
over all processors)//
//?? GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max
GPU time over all processors)//
//?? CpuToGpu Count: total number of CPU to GPU copies per processor//
//?? CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies
per processor)//
//?? GpuToCpu Count: total number of GPU to CPU copies per processor//
//?? GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies
per processor)//
//?? GPU %F: percent flops on GPU in this event//
//------------------------------------------------------------------------------------------------------------------------//
//Event??????????????? Count????? Time (sec)????
Flop????????????????????????????? --- Global ---? --- Stage ----?
Total?? GPU??? - CpuToGpu -?? - GpuToCpu - GPU//
//?????????????????? Max Ratio? Max???? Ratio?? Max? Ratio? Mess??
AvgLen? Reduct? %T %F %M %L %R? %T %F %M %L %R Mflop/s Mflop/s Count??
Size?? Count?? Size? %F//
//---------------------------------------------------------------------------------------------------------------------------------------------------------------//
//
//--- Event Stage 0: Main Stage//
//
//VecSet??????????????? 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//VecAssemblyBegin????? 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//VecAssemblyEnd??????? 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatCopy??????????? 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 7? 0? 0? 0? 0?? 7? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatConvert????????? 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatScale?????????? 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00
0.0e+00 0.0e+00 14? 3? 0? 0? 0? 14? 3? 0? 0? 0? 2303?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatAssemblyBegin?? 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatAssemblyEnd???? 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatZeroEntries???? 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 6? 0? 0? 0? 0?? 6? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatAXPY???????????? 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00
0.0e+00 0.0e+00? 2? 1? 0? 0? 0?? 2? 1? 0? 0? 0? 4557?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatMatMultSym?????? 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//MatMatMultNum?????? 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00
0.0e+00 0.0e+00? 8 96? 0? 0? 0?? 8 96? 0? 0? 0 123331?????? 0????? 0
0.00e+00??? 0 0.00e+00? 0//
//---------------------------------------------------------------------------------------------------------------------------------------------------------------//
//
//Memory usage is given in bytes://
//
//Object Type????????? Creations?? Destructions???? Memory? Descendants'
Mem.//
//Reports information only for process 0.//
//
//--- Event Stage 0: Main Stage//
//
//????????????? Vector??? 37???????????? 34????? 1634064???? 0.//
//????????????? Matrix? 2120?????????? 2120? 52734663456???? 0.//
//????????????? Viewer???? 1????????????? 0??????????? 0???? 0.//
//========================================================================================================================/

Apparently, MatMatMultNum and MatScale take the most time (by far)
during execution. Therefore, I was wondering if it is possible to move
those operations/all matrices and vectors to a GPU or another
accelerator. According to
https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is only supported
for distributed vectors, but not for dense distributed matrices. Are
there any updates related to that, or other ways to speed up the
involved operations?

Thanks!

Regards,

Roland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/61930816/attachment-0001.html>

From mfadams at lbl.gov  Tue Feb 16 07:01:12 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 16 Feb 2021 08:01:12 -0500
Subject: [petsc-users] Using distributed dense matrix/vector operations
 on a GPU
In-Reply-To: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
Message-ID: <CADOhEh6NqsZ10PcoQPLgy6J4fYGCpzvio28UDAuKQS+nWQ=EiQ@mail.gmail.com>

You want to use -mat_type densecuda and -vec_type cuda

You can see from the 0 in the last column that none of your work is done on
the GPU.

Mark

On Tue, Feb 16, 2021 at 3:43 AM Roland Richter <roland.richter at ntnu.no>
wrote:

> Hei,
>
> after profiling my program using -log_view, I got the following output
> (all matrices are dense):
>
> *Using 8 OpenMP threads*
> *Using Petsc Development GIT revision: v3.14.3-583-g5464005aea  GIT Date:
> 2021-01-25 16:01:41 -0600*
>
> *                         Max       Max/Min     Avg       Total*
> *Time (sec):           5.074e+03     1.000   5.074e+03*
> *Objects:              2.158e+03     1.000   2.158e+03*
> *Flop:                 5.236e+13     1.000   5.236e+13  5.236e+13*
> *Flop/sec:             1.032e+10     1.000   1.032e+10  1.032e+10*
> *MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00*
> *MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00*
> *MPI Reductions:       0.000e+00     0.000*
>
> *Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)*
> *                            e.g., VecAXPY() for real vectors of length N
> --> 2N flop*
> *                            and VecAXPY() for complex vectors of length N
> --> 8N flop*
>
> *Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
> ---  -- Message Lengths --  -- Reductions --*
> *                        Avg     %Total     Avg     %Total    Count
> %Total     Avg         %Total    Count   %Total*
> * 0:      Main Stage: 5.0744e+03 100.0%  5.2359e+13 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%*
>
>
> *------------------------------------------------------------------------------------------------------------------------*
> *See the 'Profiling' chapter of the users' manual for details on
> interpreting output.*
> *Phase summary info:*
> *   Count: number of times phase was executed*
> *   Time and Flop: Max - maximum over all processors*
> *                  Ratio - ratio of maximum to minimum over all processors*
> *   Mess: number of messages sent*
> *   AvgLen: average message length (bytes)*
> *   Reduct: number of global reductions*
> *   Global: entire computation*
> *   Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().*
> *      %T - percent time in this phase         %F - percent flop in this
> phase*
> *      %M - percent messages in this phase     %L - percent message
> lengths in this phase*
> *      %R - percent reductions in this phase*
> *   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
> over all processors)*
> *   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
> time over all processors)*
> *   CpuToGpu Count: total number of CPU to GPU copies per processor*
> *   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
> processor)*
> *   GpuToCpu Count: total number of GPU to CPU copies per processor*
> *   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
> processor)*
> *   GPU %F: percent flops on GPU in this event*
>
> *------------------------------------------------------------------------------------------------------------------------*
> *Event                Count      Time (sec)
> Flop                              --- Global ---  --- Stage ----  Total
> GPU    - CpuToGpu -   - GpuToCpu - GPU*
> *                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size
> Count   Size  %F*
>
> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>
> *--- Event Stage 0: Main Stage*
>
> *VecSet                37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *VecAssemblyBegin      31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *VecAssemblyEnd        31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatCopy            49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatConvert          2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatScale           56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 0.0e+00
> 0.0e+00 14  3  0  0  0  14  3  0  0  0  2303       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatAssemblyBegin   56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatAssemblyEnd     56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatZeroEntries     60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  6  0  0  0  0   6  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatAXPY             8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0  4557       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatMatMultSym       4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatMatMultNum       4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 0.0e+00
> 0.0e+00  8 96  0  0  0   8 96  0  0  0 123331       0      0 0.00e+00    0
> 0.00e+00  0*
>
> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>
> *Memory usage is given in bytes:*
>
> *Object Type          Creations   Destructions     Memory  Descendants'
> Mem.*
> *Reports information only for process 0.*
>
> *--- Event Stage 0: Main Stage*
>
> *              Vector    37             34      1634064     0.*
> *              Matrix  2120           2120  52734663456     0.*
> *              Viewer     1              0            0     0.*
>
> *========================================================================================================================*
>
> Apparently, MatMatMultNum and MatScale take the most time (by far) during
> execution. Therefore, I was wondering if it is possible to move those
> operations/all matrices and vectors to a GPU or another accelerator.
> According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is
> only supported for distributed vectors, but not for dense distributed
> matrices. Are there any updates related to that, or other ways to speed up
> the involved operations?
>
> Thanks!
>
> Regards,
>
> Roland
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/05af68f9/attachment.html>

From stefano.zampini at gmail.com  Tue Feb 16 07:14:32 2021
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Tue, 16 Feb 2021 16:14:32 +0300
Subject: [petsc-users] Using distributed dense matrix/vector operations
 on a GPU
In-Reply-To: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
Message-ID: <CAGPUishH58nELqRJgXWN=O0Ma7SLXnMsd5Bch9igTfK1TwYhJQ@mail.gmail.com>

Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter <
roland.richter at ntnu.no> ha scritto:

> Hei,
>
> after profiling my program using -log_view, I got the following output
> (all matrices are dense):
>
> *Using 8 OpenMP threads*
> *Using Petsc Development GIT revision: v3.14.3-583-g5464005aea  GIT Date:
> 2021-01-25 16:01:41 -0600*
>
> *                         Max       Max/Min     Avg       Total*
> *Time (sec):           5.074e+03     1.000   5.074e+03*
> *Objects:              2.158e+03     1.000   2.158e+03*
> *Flop:                 5.236e+13     1.000   5.236e+13  5.236e+13*
> *Flop/sec:             1.032e+10     1.000   1.032e+10  1.032e+10*
> *MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00*
> *MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00*
> *MPI Reductions:       0.000e+00     0.000*
>
> *Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)*
> *                            e.g., VecAXPY() for real vectors of length N
> --> 2N flop*
> *                            and VecAXPY() for complex vectors of length N
> --> 8N flop*
>
> *Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
> ---  -- Message Lengths --  -- Reductions --*
> *                        Avg     %Total     Avg     %Total    Count
> %Total     Avg         %Total    Count   %Total*
> * 0:      Main Stage: 5.0744e+03 100.0%  5.2359e+13 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%*
>
>
> *------------------------------------------------------------------------------------------------------------------------*
> *See the 'Profiling' chapter of the users' manual for details on
> interpreting output.*
> *Phase summary info:*
> *   Count: number of times phase was executed*
> *   Time and Flop: Max - maximum over all processors*
> *                  Ratio - ratio of maximum to minimum over all processors*
> *   Mess: number of messages sent*
> *   AvgLen: average message length (bytes)*
> *   Reduct: number of global reductions*
> *   Global: entire computation*
> *   Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().*
> *      %T - percent time in this phase         %F - percent flop in this
> phase*
> *      %M - percent messages in this phase     %L - percent message
> lengths in this phase*
> *      %R - percent reductions in this phase*
> *   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
> over all processors)*
> *   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
> time over all processors)*
> *   CpuToGpu Count: total number of CPU to GPU copies per processor*
> *   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
> processor)*
> *   GpuToCpu Count: total number of GPU to CPU copies per processor*
> *   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
> processor)*
> *   GPU %F: percent flops on GPU in this event*
>
> *------------------------------------------------------------------------------------------------------------------------*
> *Event                Count      Time (sec)
> Flop                              --- Global ---  --- Stage ----  Total
> GPU    - CpuToGpu -   - GpuToCpu - GPU*
> *                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size
> Count   Size  %F*
>
> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>
> *--- Event Stage 0: Main Stage*
>
> *VecSet                37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *VecAssemblyBegin      31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *VecAssemblyEnd        31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatCopy            49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatConvert          2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatScale           56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 0.0e+00
> 0.0e+00 14  3  0  0  0  14  3  0  0  0  2303       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatAssemblyBegin   56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatAssemblyEnd     56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatZeroEntries     60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  6  0  0  0  0   6  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatAXPY             8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0  4557       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatMatMultSym       4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0*
> *MatMatMultNum       4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 0.0e+00
> 0.0e+00  8 96  0  0  0   8 96  0  0  0 123331       0      0 0.00e+00    0
> 0.00e+00  0*
>
> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>
> *Memory usage is given in bytes:*
>
> *Object Type          Creations   Destructions     Memory  Descendants'
> Mem.*
> *Reports information only for process 0.*
>
> *--- Event Stage 0: Main Stage*
>
> *              Vector    37             34      1634064     0.*
> *              Matrix  2120           2120  52734663456     0.*
> *              Viewer     1              0            0     0.*
>
> *========================================================================================================================*
>
> Apparently, MatMatMultNum and MatScale take the most time (by far) during
> execution. Therefore, I was wondering if it is possible to move those
> operations/all matrices and vectors to a GPU or another accelerator.
> According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is
> only supported for distributed vectors, but not for dense distributed
> matrices. Are there any updates related to that, or other ways to speed up
> the involved operations?
>
>
You should compute the timings associated with each call, and not consider
the lump sum. For example, each MatScale takes 6.9348e+02/56162  =
0.012347851 seconds on average,  I doubt you can get any reasonable speedup
with CUDA. What are the sizes of these matrices?


> Thanks!
>
> Regards,
>
> Roland
>


-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/f58faf57/attachment-0001.html>

From roland.richter at ntnu.no  Tue Feb 16 07:16:57 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Tue, 16 Feb 2021 14:16:57 +0100
Subject: [petsc-users] Using distributed dense matrix/vector operations
 on a GPU
In-Reply-To: <CAGPUishH58nELqRJgXWN=O0Ma7SLXnMsd5Bch9igTfK1TwYhJQ@mail.gmail.com>
References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
	<CAGPUishH58nELqRJgXWN=O0Ma7SLXnMsd5Bch9igTfK1TwYhJQ@mail.gmail.com>
Message-ID: <ac974b3b-7b65-af67-7ad3-c15cf30a6d68@ntnu.no>

Hei,

the usual size of those matrices is (cumulative, not distributed) at
least [8192x8192] x [8192x32768] complex entries as lower boundary. Does
it still make sense to test CUDA for speedup?

Thank you,

regards,

Roland

Am 16.02.21 um 14:14 schrieb Stefano Zampini:
>
>
> Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter
> <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> ha scritto:
>
>     Hei,
>
>     after profiling my program using -log_view, I got the following
>     output (all matrices are dense):
>
>     /Using 8 OpenMP threads//
>     //Using Petsc Development GIT revision: v3.14.3-583-g5464005aea?
>     GIT Date: 2021-01-25 16:01:41 -0600//
>     //
>     //???????????????????????? Max?????? Max/Min???? Avg?????? Total//
>     //Time (sec):?????????? 5.074e+03???? 1.000?? 5.074e+03//
>     //Objects:????????????? 2.158e+03???? 1.000?? 2.158e+03//
>     //Flop:???????????????? 5.236e+13???? 1.000?? 5.236e+13? 5.236e+13//
>     //Flop/sec:???????????? 1.032e+10???? 1.000?? 1.032e+10? 1.032e+10//
>     //MPI Messages:???????? 0.000e+00???? 0.000?? 0.000e+00? 0.000e+00//
>     //MPI Message Lengths:? 0.000e+00???? 0.000?? 0.000e+00? 0.000e+00//
>     //MPI Reductions:?????? 0.000e+00???? 0.000//
>     //
>     //Flop counting convention: 1 flop = 1 real number operation of
>     type (multiply/divide/add/subtract)//
>     //??????????????????????????? e.g., VecAXPY() for real vectors of
>     length N --> 2N flop//
>     //??????????????????????????? and VecAXPY() for complex vectors of
>     length N --> 8N flop//
>     //
>     //Summary of Stages:?? ----- Time ------? ----- Flop ------? ---
>     Messages ---? -- Message Lengths --? -- Reductions --//
>     //??????????????????????? Avg???? %Total???? Avg???? %Total???
>     Count?? %Total???? Avg???????? %Total??? Count?? %Total//
>     //?0:????? Main Stage: 5.0744e+03 100.0%? 5.2359e+13 100.0%?
>     0.000e+00?? 0.0%? 0.000e+00??????? 0.0%? 0.000e+00?? 0.0%//
>     //
>     //------------------------------------------------------------------------------------------------------------------------//
>     //See the 'Profiling' chapter of the users' manual for details on
>     interpreting output.//
>     //Phase summary info://
>     //?? Count: number of times phase was executed//
>     //?? Time and Flop: Max - maximum over all processors//
>     //????????????????? Ratio - ratio of maximum to minimum over all
>     processors//
>     //?? Mess: number of messages sent//
>     //?? AvgLen: average message length (bytes)//
>     //?? Reduct: number of global reductions//
>     //?? Global: entire computation//
>     //?? Stage: stages of a computation. Set stages with
>     PetscLogStagePush() and PetscLogStagePop().//
>     //????? %T - percent time in this phase???????? %F - percent flop
>     in this phase//
>     //????? %M - percent messages in this phase???? %L - percent
>     message lengths in this phase//
>     //????? %R - percent reductions in this phase//
>     //?? Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max
>     time over all processors)//
>     //?? GPU Mflop/s: 10e-6 * (sum of flop on GPU over all
>     processors)/(max GPU time over all processors)//
>     //?? CpuToGpu Count: total number of CPU to GPU copies per processor//
>     //?? CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU
>     copies per processor)//
>     //?? GpuToCpu Count: total number of GPU to CPU copies per processor//
>     //?? GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU
>     copies per processor)//
>     //?? GPU %F: percent flops on GPU in this event//
>     //------------------------------------------------------------------------------------------------------------------------//
>     //Event??????????????? Count????? Time (sec)????
>     Flop????????????????????????????? --- Global ---? --- Stage ----?
>     Total?? GPU??? - CpuToGpu -?? - GpuToCpu - GPU//
>     //?????????????????? Max Ratio? Max???? Ratio?? Max? Ratio? Mess??
>     AvgLen? Reduct? %T %F %M %L %R? %T %F %M %L %R Mflop/s Mflop/s
>     Count?? Size?? Count?? Size? %F//
>     //---------------------------------------------------------------------------------------------------------------------------------------------------------------//
>     //
>     //--- Event Stage 0: Main Stage//
>     //
>     //VecSet??????????????? 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //VecAssemblyBegin????? 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //VecAssemblyEnd??????? 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatCopy??????????? 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 7? 0? 0? 0? 0?? 7? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatConvert????????? 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatScale?????????? 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00
>     0.0e+00 0.0e+00 14? 3? 0? 0? 0? 14? 3? 0? 0? 0? 2303?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatAssemblyBegin?? 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatAssemblyEnd???? 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatZeroEntries???? 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 6? 0? 0? 0? 0?? 6? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatAXPY???????????? 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00
>     0.0e+00 0.0e+00? 2? 1? 0? 0? 0?? 2? 1? 0? 0? 0? 4557?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatMatMultSym?????? 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00
>     0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //MatMatMultNum?????? 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00
>     0.0e+00 0.0e+00? 8 96? 0? 0? 0?? 8 96? 0? 0? 0 123331?????? 0?????
>     0 0.00e+00??? 0 0.00e+00? 0//
>     //---------------------------------------------------------------------------------------------------------------------------------------------------------------//
>     //
>     //Memory usage is given in bytes://
>     //
>     //Object Type????????? Creations?? Destructions???? Memory?
>     Descendants' Mem.//
>     //Reports information only for process 0.//
>     //
>     //--- Event Stage 0: Main Stage//
>     //
>     //????????????? Vector??? 37???????????? 34????? 1634064???? 0.//
>     //????????????? Matrix? 2120?????????? 2120? 52734663456???? 0.//
>     //????????????? Viewer???? 1????????????? 0??????????? 0???? 0.//
>     //========================================================================================================================/
>
>     Apparently, MatMatMultNum and MatScale take the most time (by far)
>     during execution. Therefore, I was wondering if it is possible to
>     move those operations/all matrices and vectors to a GPU or another
>     accelerator. According to
>     https://www.mcs.anl.gov/petsc/features/gpus.html
>     <https://www.mcs.anl.gov/petsc/features/gpus.html> CUDA is only
>     supported for distributed vectors, but not for dense distributed
>     matrices. Are there any updates related to that, or other ways to
>     speed up the involved operations?
>
>
> You should compute the timings associated with each call, and not
> consider the lump sum. For example, each MatScale takes
> 6.9348e+02/56162? = 0.012347851 seconds on average,? I doubt you can
> get any reasonable speedup with CUDA. What are the sizes of these
> matrices??
> ?
>
>     Thanks!
>
>     Regards,
>
>     Roland
>
>
>
> -- 
> Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/0889ff55/attachment.html>

From stefano.zampini at gmail.com  Tue Feb 16 07:25:53 2021
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Tue, 16 Feb 2021 16:25:53 +0300
Subject: [petsc-users] Using distributed dense matrix/vector operations
 on a GPU
In-Reply-To: <ac974b3b-7b65-af67-7ad3-c15cf30a6d68@ntnu.no>
References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
	<CAGPUishH58nELqRJgXWN=O0Ma7SLXnMsd5Bch9igTfK1TwYhJQ@mail.gmail.com>
	<ac974b3b-7b65-af67-7ad3-c15cf30a6d68@ntnu.no>
Message-ID: <CAGPUisheORsz4TO9+UjY86E0hF66pOLxZGVsTbx=R2tpk-9a0A@mail.gmail.com>

>
>
>
>
the usual size of those matrices is (cumulative, not distributed) at least
> [8192x8192] x [8192x32768] complex entries as lower boundary. Does it still
> make sense to test CUDA for speedup?
>
> I don't understand your notation. Are you saying your matrices are 8K x
8K? or 8K*32K? or what?


> Thank you,
>
> regards,
>
> Roland
> Am 16.02.21 um 14:14 schrieb Stefano Zampini:
>
>
>
> Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter <
> roland.richter at ntnu.no> ha scritto:
>
>> Hei,
>>
>> after profiling my program using -log_view, I got the following output
>> (all matrices are dense):
>>
>> *Using 8 OpenMP threads*
>> *Using Petsc Development GIT revision: v3.14.3-583-g5464005aea  GIT Date:
>> 2021-01-25 16:01:41 -0600*
>>
>> *                         Max       Max/Min     Avg       Total*
>> *Time (sec):           5.074e+03     1.000   5.074e+03*
>> *Objects:              2.158e+03     1.000   2.158e+03*
>> *Flop:                 5.236e+13     1.000   5.236e+13  5.236e+13*
>> *Flop/sec:             1.032e+10     1.000   1.032e+10  1.032e+10*
>> *MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00*
>> *MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00*
>> *MPI Reductions:       0.000e+00     0.000*
>>
>> *Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)*
>> *                            e.g., VecAXPY() for real vectors of length N
>> --> 2N flop*
>> *                            and VecAXPY() for complex vectors of length
>> N --> 8N flop*
>>
>> *Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
>> ---  -- Message Lengths --  -- Reductions --*
>> *                        Avg     %Total     Avg     %Total    Count
>> %Total     Avg         %Total    Count   %Total*
>> * 0:      Main Stage: 5.0744e+03 100.0%  5.2359e+13 100.0%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%*
>>
>>
>> *------------------------------------------------------------------------------------------------------------------------*
>> *See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.*
>> *Phase summary info:*
>> *   Count: number of times phase was executed*
>> *   Time and Flop: Max - maximum over all processors*
>> *                  Ratio - ratio of maximum to minimum over all
>> processors*
>> *   Mess: number of messages sent*
>> *   AvgLen: average message length (bytes)*
>> *   Reduct: number of global reductions*
>> *   Global: entire computation*
>> *   Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().*
>> *      %T - percent time in this phase         %F - percent flop in this
>> phase*
>> *      %M - percent messages in this phase     %L - percent message
>> lengths in this phase*
>> *      %R - percent reductions in this phase*
>> *   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>> over all processors)*
>> *   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max
>> GPU time over all processors)*
>> *   CpuToGpu Count: total number of CPU to GPU copies per processor*
>> *   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
>> processor)*
>> *   GpuToCpu Count: total number of GPU to CPU copies per processor*
>> *   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
>> processor)*
>> *   GPU %F: percent flops on GPU in this event*
>>
>> *------------------------------------------------------------------------------------------------------------------------*
>> *Event                Count      Time (sec)
>> Flop                              --- Global ---  --- Stage ----  Total
>> GPU    - CpuToGpu -   - GpuToCpu - GPU*
>> *                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size
>> Count   Size  %F*
>>
>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>>
>> *--- Event Stage 0: Main Stage*
>>
>> *VecSet                37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *VecAssemblyBegin      31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *VecAssemblyEnd        31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatCopy            49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatConvert          2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatScale           56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 0.0e+00
>> 0.0e+00 14  3  0  0  0  14  3  0  0  0  2303       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatAssemblyBegin   56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatAssemblyEnd     56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatZeroEntries     60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  6  0  0  0  0   6  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatAXPY             8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2  1  0  0  0   2  1  0  0  0  4557       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatMatMultSym       4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0*
>> *MatMatMultNum       4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 0.0e+00
>> 0.0e+00  8 96  0  0  0   8 96  0  0  0 123331       0      0 0.00e+00    0
>> 0.00e+00  0*
>>
>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>>
>> *Memory usage is given in bytes:*
>>
>> *Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.*
>> *Reports information only for process 0.*
>>
>> *--- Event Stage 0: Main Stage*
>>
>> *              Vector    37             34      1634064     0.*
>> *              Matrix  2120           2120  52734663456     0.*
>> *              Viewer     1              0            0     0.*
>>
>> *========================================================================================================================*
>>
>> Apparently, MatMatMultNum and MatScale take the most time (by far) during
>> execution. Therefore, I was wondering if it is possible to move those
>> operations/all matrices and vectors to a GPU or another accelerator.
>> According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is
>> only supported for distributed vectors, but not for dense distributed
>> matrices. Are there any updates related to that, or other ways to speed up
>> the involved operations?
>>
>
> You should compute the timings associated with each call, and not consider
> the lump sum. For example, each MatScale takes 6.9348e+02/56162  =
> 0.012347851 seconds on average,  I doubt you can get any reasonable speedup
> with CUDA. What are the sizes of these matrices?
>
>
>> Thanks!
>>
>> Regards,
>>
>> Roland
>>
>
>
> --
> Stefano
>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/41251233/attachment-0001.html>

From roland.richter at ntnu.no  Tue Feb 16 07:30:06 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Tue, 16 Feb 2021 14:30:06 +0100
Subject: [petsc-users] Using distributed dense matrix/vector operations
 on a GPU
In-Reply-To: <CAGPUisheORsz4TO9+UjY86E0hF66pOLxZGVsTbx=R2tpk-9a0A@mail.gmail.com>
References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
	<CAGPUishH58nELqRJgXWN=O0Ma7SLXnMsd5Bch9igTfK1TwYhJQ@mail.gmail.com>
	<ac974b3b-7b65-af67-7ad3-c15cf30a6d68@ntnu.no>
	<CAGPUisheORsz4TO9+UjY86E0hF66pOLxZGVsTbx=R2tpk-9a0A@mail.gmail.com>
Message-ID: <7cb65ca8-1748-2e08-0a04-d61c21c6a40a@ntnu.no>

For MatMatMult the size of the involved matrices is? 8k x 8k and 8k x
32k. I am not sure where MatScale is called, I never call it explicitly.
If MatDiagonalScale calls MatScale, then the involved matrices have a
size of 8k x 32k.

Regards,

Roland

Am 16.02.21 um 14:25 schrieb Stefano Zampini:
>
>
>     ?
>
>     the usual size of those matrices is (cumulative, not distributed)
>     at least [8192x8192] x [8192x32768] complex entries as lower
>     boundary. Does it still make sense to test CUDA for speedup?
>
> I don't understand your notation. Are you saying your matrices are 8K
> x 8K? or 8K*32K? or what?
> ?
>
>     Thank you,
>
>     regards,
>
>     Roland
>
>     Am 16.02.21 um 14:14 schrieb Stefano Zampini:
>>
>>
>>     Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter
>>     <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> ha scritto:
>>
>>         Hei,
>>
>>         after profiling my program using -log_view, I got the
>>         following output (all matrices are dense):
>>
>>         /Using 8 OpenMP threads//
>>         //Using Petsc Development GIT revision:
>>         v3.14.3-583-g5464005aea? GIT Date: 2021-01-25 16:01:41 -0600//
>>         //
>>         //???????????????????????? Max?????? Max/Min???? Avg??????
>>         Total//
>>         //Time (sec):?????????? 5.074e+03???? 1.000?? 5.074e+03//
>>         //Objects:????????????? 2.158e+03???? 1.000?? 2.158e+03//
>>         //Flop:???????????????? 5.236e+13???? 1.000?? 5.236e+13?
>>         5.236e+13//
>>         //Flop/sec:???????????? 1.032e+10???? 1.000?? 1.032e+10?
>>         1.032e+10//
>>         //MPI Messages:???????? 0.000e+00???? 0.000?? 0.000e+00?
>>         0.000e+00//
>>         //MPI Message Lengths:? 0.000e+00???? 0.000?? 0.000e+00?
>>         0.000e+00//
>>         //MPI Reductions:?????? 0.000e+00???? 0.000//
>>         //
>>         //Flop counting convention: 1 flop = 1 real number operation
>>         of type (multiply/divide/add/subtract)//
>>         //??????????????????????????? e.g., VecAXPY() for real
>>         vectors of length N --> 2N flop//
>>         //??????????????????????????? and VecAXPY() for complex
>>         vectors of length N --> 8N flop//
>>         //
>>         //Summary of Stages:?? ----- Time ------? ----- Flop ------?
>>         --- Messages ---? -- Message Lengths --? -- Reductions --//
>>         //??????????????????????? Avg???? %Total???? Avg????
>>         %Total??? Count?? %Total???? Avg???????? %Total??? Count??
>>         %Total//
>>         //?0:????? Main Stage: 5.0744e+03 100.0%? 5.2359e+13 100.0%?
>>         0.000e+00?? 0.0%? 0.000e+00??????? 0.0%? 0.000e+00?? 0.0%//
>>         //
>>         //------------------------------------------------------------------------------------------------------------------------//
>>         //See the 'Profiling' chapter of the users' manual for
>>         details on interpreting output.//
>>         //Phase summary info://
>>         //?? Count: number of times phase was executed//
>>         //?? Time and Flop: Max - maximum over all processors//
>>         //????????????????? Ratio - ratio of maximum to minimum over
>>         all processors//
>>         //?? Mess: number of messages sent//
>>         //?? AvgLen: average message length (bytes)//
>>         //?? Reduct: number of global reductions//
>>         //?? Global: entire computation//
>>         //?? Stage: stages of a computation. Set stages with
>>         PetscLogStagePush() and PetscLogStagePop().//
>>         //????? %T - percent time in this phase???????? %F - percent
>>         flop in this phase//
>>         //????? %M - percent messages in this phase???? %L - percent
>>         message lengths in this phase//
>>         //????? %R - percent reductions in this phase//
>>         //?? Total Mflop/s: 10e-6 * (sum of flop over all
>>         processors)/(max time over all processors)//
>>         //?? GPU Mflop/s: 10e-6 * (sum of flop on GPU over all
>>         processors)/(max GPU time over all processors)//
>>         //?? CpuToGpu Count: total number of CPU to GPU copies per
>>         processor//
>>         //?? CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to
>>         GPU copies per processor)//
>>         //?? GpuToCpu Count: total number of GPU to CPU copies per
>>         processor//
>>         //?? GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to
>>         CPU copies per processor)//
>>         //?? GPU %F: percent flops on GPU in this event//
>>         //------------------------------------------------------------------------------------------------------------------------//
>>         //Event??????????????? Count????? Time (sec)????
>>         Flop????????????????????????????? --- Global ---? --- Stage
>>         ----? Total?? GPU??? - CpuToGpu -?? - GpuToCpu - GPU//
>>         //?????????????????? Max Ratio? Max???? Ratio?? Max? Ratio?
>>         Mess?? AvgLen? Reduct? %T %F %M %L %R? %T %F %M %L %R Mflop/s
>>         Mflop/s Count?? Size?? Count?? Size? %F//
>>         //---------------------------------------------------------------------------------------------------------------------------------------------------------------//
>>         //
>>         //--- Event Stage 0: Main Stage//
>>         //
>>         //VecSet??????????????? 37 1.0 1.0354e-04 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //VecAssemblyBegin????? 31 1.0 2.9080e-06 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //VecAssemblyEnd??????? 31 1.0 2.3270e-06 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatCopy??????????? 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 7? 0? 0? 0? 0?? 7? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatConvert????????? 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatScale?????????? 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0
>>         0.0e+00 0.0e+00 0.0e+00 14? 3? 0? 0? 0? 14? 3? 0? 0? 0?
>>         2303?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatAssemblyBegin?? 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatAssemblyEnd???? 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatZeroEntries???? 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 6? 0? 0? 0? 0?? 6? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatAXPY???????????? 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0
>>         0.0e+00 0.0e+00 0.0e+00? 2? 1? 0? 0? 0?? 2? 1? 0? 0? 0?
>>         4557?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatMatMultSym?????? 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0
>>         0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0????
>>         0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //MatMatMultNum?????? 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0
>>         0.0e+00 0.0e+00 0.0e+00? 8 96? 0? 0? 0?? 8 96? 0? 0? 0
>>         123331?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>         //---------------------------------------------------------------------------------------------------------------------------------------------------------------//
>>         //
>>         //Memory usage is given in bytes://
>>         //
>>         //Object Type????????? Creations?? Destructions???? Memory?
>>         Descendants' Mem.//
>>         //Reports information only for process 0.//
>>         //
>>         //--- Event Stage 0: Main Stage//
>>         //
>>         //????????????? Vector??? 37???????????? 34????? 1634064???? 0.//
>>         //????????????? Matrix? 2120?????????? 2120? 52734663456???? 0.//
>>         //????????????? Viewer???? 1????????????? 0??????????? 0???? 0.//
>>         //========================================================================================================================/
>>
>>         Apparently, MatMatMultNum and MatScale take the most time (by
>>         far) during execution. Therefore, I was wondering if it is
>>         possible to move those operations/all matrices and vectors to
>>         a GPU or another accelerator. According to
>>         https://www.mcs.anl.gov/petsc/features/gpus.html
>>         <https://www.mcs.anl.gov/petsc/features/gpus.html> CUDA is
>>         only supported for distributed vectors, but not for dense
>>         distributed matrices. Are there any updates related to that,
>>         or other ways to speed up the involved operations?
>>
>>
>>     You should compute the timings associated with each call, and not
>>     consider the lump sum. For example, each MatScale takes
>>     6.9348e+02/56162? = 0.012347851 seconds on average,? I doubt you
>>     can get any reasonable speedup with CUDA. What are the sizes of
>>     these matrices??
>>     ?
>>
>>         Thanks!
>>
>>         Regards,
>>
>>         Roland
>>
>>
>>
>>     -- 
>>     Stefano
>
>
>
> -- 
> Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/3fd0d64e/attachment.html>

From knepley at gmail.com  Tue Feb 16 07:42:32 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 16 Feb 2021 08:42:32 -0500
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <BB87E0DC-5F29-4893-87BA-9FA9A1F9E19B@petsc.dev>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
	<BB87E0DC-5F29-4893-87BA-9FA9A1F9E19B@petsc.dev>
Message-ID: <CAMYG4Gn1qp-r=XkJYSuACqgVMjHdvd-kQLR+8oxceUdAmTfOxA@mail.gmail.com>

On Mon, Feb 15, 2021 at 10:50 PM Barry Smith <bsmith at petsc.dev> wrote:

> Swarnava,
>
> sddft.h is not a PETSc include file, nor is it used by PETSc so I think
> the issue is not directly to PETSc it is related to where sddft is on the
> machine and how it is found by your makefile.
>

Barry,

His problem is that he is trying to put extra include flags on the compile
line, but it is not working. I am wondering if his make is malfunctioning.

  Thanks,

     Matt


>   Barry
>
>
>
> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh <swarnava89 at gmail.com>
> wrote:
> >
> > Dear Petsc developers and users,
> >
> > I am having some issue with building my code with the following
> makefile. I was earlier able to build this with the same makefile on a
> different machine. Would you please help me out on this issue?
> >
> > Contents of makefile:
> > ==============================================
> > all:sparc
> >
> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> >
> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc
> ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc
> ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc
> ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc
> ./src/electrostatics.cc ./src/tools.cc
> >
> > SOURCEH = ./inc/sddft.h ./inc/isddft.h
> >
> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o
> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
> >
> > LIBBASE = ./lib/sparc
> >
> > CLEANFILES = ./lib/sparc
> >
> > include ${PETSC_DIR}/lib/petsc/conf/variables
> > include ${PETSC_DIR}/lib/petsc/conf/rules
> >
> > sparc: ${OBJSC} chkopts
> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> > ${RM} $(SOURCECPP:%.cc=%.o)
> >
> > ===========================================
> > Error:
> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g
>  -I/home/swarnava/petsc/include
> -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error:
> cannot open source file "sddft.h"
> >   #include "sddft.h"
> >                     ^
> > ====================================================
> >
> > It's not able to see the header file though I have -I ./inc in
> CPPFLAGS. The directory containing makefile has the directory "inc" with
> the headers and "src" with the .cc files.
> >
> > Thank you,
> > Swarnava
> >
> >
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/5f64e46b/attachment-0001.html>

From stefano.zampini at gmail.com  Tue Feb 16 07:46:37 2021
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Tue, 16 Feb 2021 16:46:37 +0300
Subject: [petsc-users] Using distributed dense matrix/vector operations
 on a GPU
In-Reply-To: <7cb65ca8-1748-2e08-0a04-d61c21c6a40a@ntnu.no>
References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
	<CAGPUishH58nELqRJgXWN=O0Ma7SLXnMsd5Bch9igTfK1TwYhJQ@mail.gmail.com>
	<ac974b3b-7b65-af67-7ad3-c15cf30a6d68@ntnu.no>
	<CAGPUisheORsz4TO9+UjY86E0hF66pOLxZGVsTbx=R2tpk-9a0A@mail.gmail.com>
	<7cb65ca8-1748-2e08-0a04-d61c21c6a40a@ntnu.no>
Message-ID: <CAGPUishYV+H4gK8ATahAqu7aRuZtdt3JiaZdLy13SE+f6DihyQ@mail.gmail.com>

Il giorno mar 16 feb 2021 alle ore 16:30 Roland Richter <
roland.richter at ntnu.no> ha scritto:

> For MatMatMult the size of the involved matrices is  8k x 8k and 8k x 32k.
>
Ok, so you have 32k columns to multiply against. Maybe you can get some
speedup
Howver, if you keep updating the matrix entries on CPU, then using CUDA
will make little sense.
In any case, you can try and see if you get any speedup

> I am not sure where MatScale is called, I never call it explicitly. If
> MatDiagonalScale calls MatScale, then the involved matrices have a size of
> 8k x 32k.
>
No, it does not, Are you calling MatAYPX?


> Regards,
>
> Roland
> Am 16.02.21 um 14:25 schrieb Stefano Zampini:
>
>
>>
>>
> the usual size of those matrices is (cumulative, not distributed) at least
>> [8192x8192] x [8192x32768] complex entries as lower boundary. Does it still
>> make sense to test CUDA for speedup?
>>
> I don't understand your notation. Are you saying your matrices are 8K x
> 8K? or 8K*32K? or what?
>
>
>> Thank you,
>>
>> regards,
>>
>> Roland
>> Am 16.02.21 um 14:14 schrieb Stefano Zampini:
>>
>>
>>
>> Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter <
>> roland.richter at ntnu.no> ha scritto:
>>
>>> Hei,
>>>
>>> after profiling my program using -log_view, I got the following output
>>> (all matrices are dense):
>>>
>>> *Using 8 OpenMP threads*
>>> *Using Petsc Development GIT revision: v3.14.3-583-g5464005aea  GIT
>>> Date: 2021-01-25 16:01:41 -0600*
>>>
>>> *                         Max       Max/Min     Avg       Total*
>>> *Time (sec):           5.074e+03     1.000   5.074e+03*
>>> *Objects:              2.158e+03     1.000   2.158e+03*
>>> *Flop:                 5.236e+13     1.000   5.236e+13  5.236e+13*
>>> *Flop/sec:             1.032e+10     1.000   1.032e+10  1.032e+10*
>>> *MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00*
>>> *MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00*
>>> *MPI Reductions:       0.000e+00     0.000*
>>>
>>> *Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)*
>>> *                            e.g., VecAXPY() for real vectors of length
>>> N --> 2N flop*
>>> *                            and VecAXPY() for complex vectors of length
>>> N --> 8N flop*
>>>
>>> *Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
>>> ---  -- Message Lengths --  -- Reductions --*
>>> *                        Avg     %Total     Avg     %Total    Count
>>> %Total     Avg         %Total    Count   %Total*
>>> * 0:      Main Stage: 5.0744e+03 100.0%  5.2359e+13 100.0%  0.000e+00
>>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%*
>>>
>>>
>>> *------------------------------------------------------------------------------------------------------------------------*
>>> *See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.*
>>> *Phase summary info:*
>>> *   Count: number of times phase was executed*
>>> *   Time and Flop: Max - maximum over all processors*
>>> *                  Ratio - ratio of maximum to minimum over all
>>> processors*
>>> *   Mess: number of messages sent*
>>> *   AvgLen: average message length (bytes)*
>>> *   Reduct: number of global reductions*
>>> *   Global: entire computation*
>>> *   Stage: stages of a computation. Set stages with PetscLogStagePush()
>>> and PetscLogStagePop().*
>>> *      %T - percent time in this phase         %F - percent flop in this
>>> phase*
>>> *      %M - percent messages in this phase     %L - percent message
>>> lengths in this phase*
>>> *      %R - percent reductions in this phase*
>>> *   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>>> over all processors)*
>>> *   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max
>>> GPU time over all processors)*
>>> *   CpuToGpu Count: total number of CPU to GPU copies per processor*
>>> *   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
>>> processor)*
>>> *   GpuToCpu Count: total number of GPU to CPU copies per processor*
>>> *   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
>>> processor)*
>>> *   GPU %F: percent flops on GPU in this event*
>>>
>>> *------------------------------------------------------------------------------------------------------------------------*
>>> *Event                Count      Time (sec)
>>> Flop                              --- Global ---  --- Stage ----  Total
>>> GPU    - CpuToGpu -   - GpuToCpu - GPU*
>>> *                   Max Ratio  Max     Ratio   Max  Ratio  Mess
>>> AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count
>>> Size   Count   Size  %F*
>>>
>>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>>>
>>> *--- Event Stage 0: Main Stage*
>>>
>>> *VecSet                37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *VecAssemblyBegin      31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *VecAssemblyEnd        31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatCopy            49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  7  0  0  0  0   7  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatConvert          2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatScale           56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00
>>> 0.0e+00 0.0e+00 14  3  0  0  0  14  3  0  0  0  2303       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatAssemblyBegin   56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatAssemblyEnd     56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatZeroEntries     60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatAXPY             8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00
>>> 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0  4557       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatMatMultSym       4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>> *MatMatMultNum       4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00
>>> 0.0e+00 0.0e+00  8 96  0  0  0   8 96  0  0  0 123331       0      0
>>> 0.00e+00    0 0.00e+00  0*
>>>
>>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>>>
>>> *Memory usage is given in bytes:*
>>>
>>> *Object Type          Creations   Destructions     Memory  Descendants'
>>> Mem.*
>>> *Reports information only for process 0.*
>>>
>>> *--- Event Stage 0: Main Stage*
>>>
>>> *              Vector    37             34      1634064     0.*
>>> *              Matrix  2120           2120  52734663456     0.*
>>> *              Viewer     1              0            0     0.*
>>>
>>> *========================================================================================================================*
>>>
>>> Apparently, MatMatMultNum and MatScale take the most time (by far)
>>> during execution. Therefore, I was wondering if it is possible to move
>>> those operations/all matrices and vectors to a GPU or another accelerator.
>>> According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is
>>> only supported for distributed vectors, but not for dense distributed
>>> matrices. Are there any updates related to that, or other ways to speed up
>>> the involved operations?
>>>
>>
>> You should compute the timings associated with each call, and not
>> consider the lump sum. For example, each MatScale takes 6.9348e+02/56162  =
>> 0.012347851 seconds on average,  I doubt you can get any reasonable speedup
>> with CUDA. What are the sizes of these matrices?
>>
>>
>>> Thanks!
>>>
>>> Regards,
>>>
>>> Roland
>>>
>>
>>
>> --
>> Stefano
>>
>>
>
> --
> Stefano
>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/60c168ab/attachment.html>

From roland.richter at ntnu.no  Tue Feb 16 07:55:51 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Tue, 16 Feb 2021 14:55:51 +0100
Subject: [petsc-users] Using distributed dense matrix/vector operations
 on a GPU
In-Reply-To: <CAGPUishYV+H4gK8ATahAqu7aRuZtdt3JiaZdLy13SE+f6DihyQ@mail.gmail.com>
References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no>
	<CAGPUishH58nELqRJgXWN=O0Ma7SLXnMsd5Bch9igTfK1TwYhJQ@mail.gmail.com>
	<ac974b3b-7b65-af67-7ad3-c15cf30a6d68@ntnu.no>
	<CAGPUisheORsz4TO9+UjY86E0hF66pOLxZGVsTbx=R2tpk-9a0A@mail.gmail.com>
	<7cb65ca8-1748-2e08-0a04-d61c21c6a40a@ntnu.no>
	<CAGPUishYV+H4gK8ATahAqu7aRuZtdt3JiaZdLy13SE+f6DihyQ@mail.gmail.com>
Message-ID: <0acdbac2-c311-3ff0-664c-6be3ce9d885a@ntnu.no>

Yes, I call MatAXPY, but the matrix size stays the same.

Regards,

Roland

Am 16.02.21 um 14:46 schrieb Stefano Zampini:
>
> Il giorno mar 16 feb 2021 alle ore 16:30 Roland Richter
> <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> ha scritto:
>
>     For MatMatMult the size of the involved matrices is? 8k x 8k and
>     8k x 32k.
>
> Ok, so you have 32k columns to multiply against. Maybe you can get
> some speedup
> Howver, if you keep updating the matrix entries on CPU, then using
> CUDA will make little sense.
> In any case, you can try and see if you get any speedup?
>
>     I am not sure where MatScale is called, I never call it
>     explicitly. If MatDiagonalScale calls MatScale, then the involved
>     matrices have a size of 8k x 32k.
>
> No, it does not, Are you calling MatAYPX??
>
> ?
>
>     Regards,
>
>     Roland
>
>     Am 16.02.21 um 14:25 schrieb Stefano Zampini:
>>
>>
>>         ?
>>
>>         the usual size of those matrices is (cumulative, not
>>         distributed) at least [8192x8192] x [8192x32768] complex
>>         entries as lower boundary. Does it still make sense to test
>>         CUDA for speedup?
>>
>>     I don't understand your notation. Are you saying your matrices
>>     are 8K x 8K? or 8K*32K? or what?
>>     ?
>>
>>         Thank you,
>>
>>         regards,
>>
>>         Roland
>>
>>         Am 16.02.21 um 14:14 schrieb Stefano Zampini:
>>>
>>>
>>>         Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter
>>>         <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> ha
>>>         scritto:
>>>
>>>             Hei,
>>>
>>>             after profiling my program using -log_view, I got the
>>>             following output (all matrices are dense):
>>>
>>>             /Using 8 OpenMP threads//
>>>             //Using Petsc Development GIT revision:
>>>             v3.14.3-583-g5464005aea? GIT Date: 2021-01-25 16:01:41
>>>             -0600//
>>>             //
>>>             //???????????????????????? Max?????? Max/Min????
>>>             Avg?????? Total//
>>>             //Time (sec):?????????? 5.074e+03???? 1.000?? 5.074e+03//
>>>             //Objects:????????????? 2.158e+03???? 1.000?? 2.158e+03//
>>>             //Flop:???????????????? 5.236e+13???? 1.000?? 5.236e+13?
>>>             5.236e+13//
>>>             //Flop/sec:???????????? 1.032e+10???? 1.000?? 1.032e+10?
>>>             1.032e+10//
>>>             //MPI Messages:???????? 0.000e+00???? 0.000?? 0.000e+00?
>>>             0.000e+00//
>>>             //MPI Message Lengths:? 0.000e+00???? 0.000?? 0.000e+00?
>>>             0.000e+00//
>>>             //MPI Reductions:?????? 0.000e+00???? 0.000//
>>>             //
>>>             //Flop counting convention: 1 flop = 1 real number
>>>             operation of type (multiply/divide/add/subtract)//
>>>             //??????????????????????????? e.g., VecAXPY() for real
>>>             vectors of length N --> 2N flop//
>>>             //??????????????????????????? and VecAXPY() for complex
>>>             vectors of length N --> 8N flop//
>>>             //
>>>             //Summary of Stages:?? ----- Time ------? ----- Flop
>>>             ------? --- Messages ---? -- Message Lengths --? --
>>>             Reductions --//
>>>             //??????????????????????? Avg???? %Total???? Avg????
>>>             %Total??? Count?? %Total???? Avg???????? %Total???
>>>             Count?? %Total//
>>>             //?0:????? Main Stage: 5.0744e+03 100.0%? 5.2359e+13
>>>             100.0%? 0.000e+00?? 0.0%? 0.000e+00??????? 0.0%?
>>>             0.000e+00?? 0.0%//
>>>             //
>>>             //------------------------------------------------------------------------------------------------------------------------//
>>>             //See the 'Profiling' chapter of the users' manual for
>>>             details on interpreting output.//
>>>             //Phase summary info://
>>>             //?? Count: number of times phase was executed//
>>>             //?? Time and Flop: Max - maximum over all processors//
>>>             //????????????????? Ratio - ratio of maximum to minimum
>>>             over all processors//
>>>             //?? Mess: number of messages sent//
>>>             //?? AvgLen: average message length (bytes)//
>>>             //?? Reduct: number of global reductions//
>>>             //?? Global: entire computation//
>>>             //?? Stage: stages of a computation. Set stages with
>>>             PetscLogStagePush() and PetscLogStagePop().//
>>>             //????? %T - percent time in this phase???????? %F -
>>>             percent flop in this phase//
>>>             //????? %M - percent messages in this phase???? %L -
>>>             percent message lengths in this phase//
>>>             //????? %R - percent reductions in this phase//
>>>             //?? Total Mflop/s: 10e-6 * (sum of flop over all
>>>             processors)/(max time over all processors)//
>>>             //?? GPU Mflop/s: 10e-6 * (sum of flop on GPU over all
>>>             processors)/(max GPU time over all processors)//
>>>             //?? CpuToGpu Count: total number of CPU to GPU copies
>>>             per processor//
>>>             //?? CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU
>>>             to GPU copies per processor)//
>>>             //?? GpuToCpu Count: total number of GPU to CPU copies
>>>             per processor//
>>>             //?? GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU
>>>             to CPU copies per processor)//
>>>             //?? GPU %F: percent flops on GPU in this event//
>>>             //------------------------------------------------------------------------------------------------------------------------//
>>>             //Event??????????????? Count????? Time (sec)????
>>>             Flop????????????????????????????? --- Global ---? ---
>>>             Stage ----? Total?? GPU??? - CpuToGpu -?? - GpuToCpu - GPU//
>>>             //?????????????????? Max Ratio? Max???? Ratio?? Max?
>>>             Ratio? Mess?? AvgLen? Reduct? %T %F %M %L %R? %T %F %M
>>>             %L %R Mflop/s Mflop/s Count?? Size?? Count?? Size? %F//
>>>             //---------------------------------------------------------------------------------------------------------------------------------------------------------------//
>>>             //
>>>             //--- Event Stage 0: Main Stage//
>>>             //
>>>             //VecSet??????????????? 37 1.0 1.0354e-04 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //VecAssemblyBegin????? 31 1.0 2.9080e-06 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //VecAssemblyEnd??????? 31 1.0 2.3270e-06 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatCopy??????????? 49928 1.0 3.7437e+02 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 7? 0? 0? 0? 0?? 7? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatConvert????????? 2080 1.0 5.8492e+00 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatScale?????????? 56162 1.0 6.9348e+02 1.0 1.60e+12
>>>             1.0 0.0e+00 0.0e+00 0.0e+00 14? 3? 0? 0? 0? 14? 3? 0? 0?
>>>             0? 2303?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatAssemblyBegin?? 56222 1.0 1.7370e-02 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatAssemblyEnd???? 56222 1.0 8.8713e-03 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatZeroEntries???? 60363 1.0 3.1011e+02 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 6? 0? 0? 0? 0?? 6? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatAXPY???????????? 8320 1.0 1.2254e+02 1.0 5.58e+11
>>>             1.0 0.0e+00 0.0e+00 0.0e+00? 2? 1? 0? 0? 0?? 2? 1? 0? 0?
>>>             0? 4557?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatMatMultSym?????? 4161 1.0 7.1613e-03 1.0 0.00e+00
>>>             0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0?
>>>             0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //MatMatMultNum?????? 4161 1.0 4.0706e+02 1.0 5.02e+13
>>>             1.0 0.0e+00 0.0e+00 0.0e+00? 8 96? 0? 0? 0?? 8 96? 0? 0?
>>>             0 123331?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0//
>>>             //---------------------------------------------------------------------------------------------------------------------------------------------------------------//
>>>             //
>>>             //Memory usage is given in bytes://
>>>             //
>>>             //Object Type????????? Creations?? Destructions????
>>>             Memory? Descendants' Mem.//
>>>             //Reports information only for process 0.//
>>>             //
>>>             //--- Event Stage 0: Main Stage//
>>>             //
>>>             //????????????? Vector??? 37???????????? 34?????
>>>             1634064???? 0.//
>>>             //????????????? Matrix? 2120?????????? 2120?
>>>             52734663456???? 0.//
>>>             //????????????? Viewer???? 1????????????? 0???????????
>>>             0???? 0.//
>>>             //========================================================================================================================/
>>>
>>>             Apparently, MatMatMultNum and MatScale take the most
>>>             time (by far) during execution. Therefore, I was
>>>             wondering if it is possible to move those operations/all
>>>             matrices and vectors to a GPU or another accelerator.
>>>             According to
>>>             https://www.mcs.anl.gov/petsc/features/gpus.html
>>>             <https://www.mcs.anl.gov/petsc/features/gpus.html> CUDA
>>>             is only supported for distributed vectors, but not for
>>>             dense distributed matrices. Are there any updates
>>>             related to that, or other ways to speed up the involved
>>>             operations?
>>>
>>>
>>>         You should compute the timings associated with each call,
>>>         and not consider the lump sum. For example, each MatScale
>>>         takes 6.9348e+02/56162? = 0.012347851 seconds on average,? I
>>>         doubt you can get any reasonable speedup with CUDA. What are
>>>         the sizes of these matrices??
>>>         ?
>>>
>>>             Thanks!
>>>
>>>             Regards,
>>>
>>>             Roland
>>>
>>>
>>>
>>>         -- 
>>>         Stefano
>>
>>
>>
>>     -- 
>>     Stefano
>
>
>
> -- 
> Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/c7339430/attachment-0001.html>

From jacob.fai at gmail.com  Tue Feb 16 08:39:05 2021
From: jacob.fai at gmail.com (Jacob Faibussowitsch)
Date: Tue, 16 Feb 2021 09:39:05 -0500
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <CAMYG4Gn1qp-r=XkJYSuACqgVMjHdvd-kQLR+8oxceUdAmTfOxA@mail.gmail.com>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
	<BB87E0DC-5F29-4893-87BA-9FA9A1F9E19B@petsc.dev>
	<CAMYG4Gn1qp-r=XkJYSuACqgVMjHdvd-kQLR+8oxceUdAmTfOxA@mail.gmail.com>
Message-ID: <AD464277-6A63-4B6C-B052-626E9F9A7F91@gmail.com>

Swarnava,

Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may explicitly declare a %.o: %.cc target and force it to include your CPPFLAGS.

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)
Cell: (312) 694-3391

> On Feb 16, 2021, at 08:42, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Mon, Feb 15, 2021 at 10:50 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
> Swarnava,
> 
> sddft.h is not a PETSc include file, nor is it used by PETSc so I think the issue is not directly to PETSc it is related to where sddft is on the machine and how it is found by your makefile.
> 
> Barry,
> 
> His problem is that he is trying to put extra include flags on the compile line, but it is not working. I am wondering if his make is malfunctioning.
> 
>   Thanks,
> 
>      Matt
>  
>   Barry
> 
> 
> 
> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh <swarnava89 at gmail.com <mailto:swarnava89 at gmail.com>> wrote:
> > 
> > Dear Petsc developers and users,
> > 
> > I am having some issue with building my code with the following makefile. I was earlier able to build this with the same makefile on a different machine. Would you please help me out on this issue?
> > 
> > Contents of makefile:
> > ==============================================
> > all:sparc
> > 
> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> > 
> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/tools.cc
> > 
> > SOURCEH = ./inc/sddft.h ./inc/isddft.h
> > 
> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
> > 
> > LIBBASE = ./lib/sparc
> > 
> > CLEANFILES = ./lib/sparc 
> > 
> > include ${PETSC_DIR}/lib/petsc/conf/variables
> > include ${PETSC_DIR}/lib/petsc/conf/rules
> > 
> > sparc: ${OBJSC} chkopts
> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> > ${RM} $(SOURCECPP:%.cc=%.o)
> > 
> > ===========================================
> > Error:
> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g     -I/home/swarnava/petsc/include -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: cannot open source file "sddft.h"
> >   #include "sddft.h"
> >                     ^
> > ====================================================
> > 
> > It's not able to see the header file though I have -I ./inc in  CPPFLAGS. The directory containing makefile has the directory "inc" with the headers and "src" with the .cc files.
> > 
> > Thank you,
> > Swarnava
> > 
> > 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/5fbcd0ad/attachment.html>

From knepley at gmail.com  Tue Feb 16 08:40:23 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 16 Feb 2021 09:40:23 -0500
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <AD464277-6A63-4B6C-B052-626E9F9A7F91@gmail.com>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
	<BB87E0DC-5F29-4893-87BA-9FA9A1F9E19B@petsc.dev>
	<CAMYG4Gn1qp-r=XkJYSuACqgVMjHdvd-kQLR+8oxceUdAmTfOxA@mail.gmail.com>
	<AD464277-6A63-4B6C-B052-626E9F9A7F91@gmail.com>
Message-ID: <CAMYG4Gknc8OgP2AGBWgSMWAPG-pkUPSnMQ45d3dSTsfs1T8NhA@mail.gmail.com>

On Tue, Feb 16, 2021 at 9:39 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
wrote:

> Swarnava,
>
> Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may
> explicitly declare a %.o: %.cc target and force it to include your CPPFLAGS.
>

No, do not do either of these things. We just need to figure out why it is
not working for you. I will make a small example as soon as possible.

  Thanks,

     Matt


> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>
> On Feb 16, 2021, at 08:42, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Mon, Feb 15, 2021 at 10:50 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>> Swarnava,
>>
>> sddft.h is not a PETSc include file, nor is it used by PETSc so I think
>> the issue is not directly to PETSc it is related to where sddft is on the
>> machine and how it is found by your makefile.
>>
>
> Barry,
>
> His problem is that he is trying to put extra include flags on the compile
> line, but it is not working. I am wondering if his make is malfunctioning.
>
>   Thanks,
>
>      Matt
>
>
>>   Barry
>>
>>
>>
>> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh <swarnava89 at gmail.com>
>> wrote:
>> >
>> > Dear Petsc developers and users,
>> >
>> > I am having some issue with building my code with the following
>> makefile. I was earlier able to build this with the same makefile on a
>> different machine. Would you please help me out on this issue?
>> >
>> > Contents of makefile:
>> > ==============================================
>> > all:sparc
>> >
>> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
>> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
>> >
>> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/
>> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/
>> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/
>> forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc
>> ./src/tools.cc
>> >
>> > SOURCEH = ./inc/sddft.h ./inc/isddft.h
>> >
>> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o
>> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
>> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
>> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
>> >
>> > LIBBASE = ./lib/sparc
>> >
>> > CLEANFILES = ./lib/sparc
>> >
>> > include ${PETSC_DIR}/lib/petsc/conf/variables
>> > include ${PETSC_DIR}/lib/petsc/conf/rules
>> >
>> > sparc: ${OBJSC} chkopts
>> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
>> > ${RM} $(SOURCECPP:%.cc=%.o)
>> >
>> > ===========================================
>> > Error:
>> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g
>>  -I/home/swarnava/petsc/include
>> -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
>> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic
>> error: cannot open source file "sddft.h"
>> >   #include "sddft.h"
>> >                     ^
>> > ====================================================
>> >
>> > It's not able to see the header file though I have -I ./inc in
>> CPPFLAGS. The directory containing makefile has the directory "inc" with
>> the headers and "src" with the .cc files.
>> >
>> > Thank you,
>> > Swarnava
>> >
>> >
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/bf366c2c/attachment.html>

From balay at mcs.anl.gov  Tue Feb 16 11:09:43 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 16 Feb 2021 11:09:43 -0600
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <CAMYG4Gknc8OgP2AGBWgSMWAPG-pkUPSnMQ45d3dSTsfs1T8NhA@mail.gmail.com>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
	<BB87E0DC-5F29-4893-87BA-9FA9A1F9E19B@petsc.dev>
	<CAMYG4Gn1qp-r=XkJYSuACqgVMjHdvd-kQLR+8oxceUdAmTfOxA@mail.gmail.com>
	<AD464277-6A63-4B6C-B052-626E9F9A7F91@gmail.com>
	<CAMYG4Gknc8OgP2AGBWgSMWAPG-pkUPSnMQ45d3dSTsfs1T8NhA@mail.gmail.com>
Message-ID: <6b1bfd4-7d3-ed49-eab1-045e790bfb6@mcs.anl.gov>

for CXX - its CXXPPFLAGS

> >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

The link options shouldn't go into preprocessor flags.

Also duplicate blas with petsc might cause grief. Best to build petsc with mkl.


Satish

On Tue, 16 Feb 2021, Matthew Knepley wrote:

> On Tue, Feb 16, 2021 at 9:39 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
> wrote:
> 
> > Swarnava,
> >
> > Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may
> > explicitly declare a %.o: %.cc target and force it to include your CPPFLAGS.
> >
> 
> No, do not do either of these things. We just need to figure out why it is
> not working for you. I will make a small example as soon as possible.
> 
>   Thanks,
> 
>      Matt
> 
> 
> > Best regards,
> >
> > Jacob Faibussowitsch
> > (Jacob Fai - booss - oh - vitch)
> > Cell: (312) 694-3391
> >
> > On Feb 16, 2021, at 08:42, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > On Mon, Feb 15, 2021 at 10:50 PM Barry Smith <bsmith at petsc.dev> wrote:
> >
> >> Swarnava,
> >>
> >> sddft.h is not a PETSc include file, nor is it used by PETSc so I think
> >> the issue is not directly to PETSc it is related to where sddft is on the
> >> machine and how it is found by your makefile.
> >>
> >
> > Barry,
> >
> > His problem is that he is trying to put extra include flags on the compile
> > line, but it is not working. I am wondering if his make is malfunctioning.
> >
> >   Thanks,
> >
> >      Matt
> >
> >
> >>   Barry
> >>
> >>
> >>
> >> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh <swarnava89 at gmail.com>
> >> wrote:
> >> >
> >> > Dear Petsc developers and users,
> >> >
> >> > I am having some issue with building my code with the following
> >> makefile. I was earlier able to build this with the same makefile on a
> >> different machine. Would you please help me out on this issue?
> >> >
> >> > Contents of makefile:
> >> > ==============================================
> >> > all:sparc
> >> >
> >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> >> >
> >> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/
> >> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/
> >> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/
> >> forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc
> >> ./src/tools.cc
> >> >
> >> > SOURCEH = ./inc/sddft.h ./inc/isddft.h
> >> >
> >> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o
> >> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
> >> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
> >> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
> >> >
> >> > LIBBASE = ./lib/sparc
> >> >
> >> > CLEANFILES = ./lib/sparc
> >> >
> >> > include ${PETSC_DIR}/lib/petsc/conf/variables
> >> > include ${PETSC_DIR}/lib/petsc/conf/rules
> >> >
> >> > sparc: ${OBJSC} chkopts
> >> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> >> > ${RM} $(SOURCECPP:%.cc=%.o)
> >> >
> >> > ===========================================
> >> > Error:
> >> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g
> >>  -I/home/swarnava/petsc/include
> >> -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> >> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic
> >> error: cannot open source file "sddft.h"
> >> >   #include "sddft.h"
> >> >                     ^
> >> > ====================================================
> >> >
> >> > It's not able to see the header file though I have -I ./inc in
> >> CPPFLAGS. The directory containing makefile has the directory "inc" with
> >> the headers and "src" with the .cc files.
> >> >
> >> > Thank you,
> >> > Swarnava
> >> >
> >> >
> >>
> >>
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which their
> > experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> > <http://www.cse.buffalo.edu/~knepley/>
> >
> >
> >
> 
> 


From balay at mcs.anl.gov  Tue Feb 16 12:07:35 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 16 Feb 2021 12:07:35 -0600
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <6b1bfd4-7d3-ed49-eab1-045e790bfb6@mcs.anl.gov>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
	<BB87E0DC-5F29-4893-87BA-9FA9A1F9E19B@petsc.dev>
	<CAMYG4Gn1qp-r=XkJYSuACqgVMjHdvd-kQLR+8oxceUdAmTfOxA@mail.gmail.com>
	<AD464277-6A63-4B6C-B052-626E9F9A7F91@gmail.com>
	<CAMYG4Gknc8OgP2AGBWgSMWAPG-pkUPSnMQ45d3dSTsfs1T8NhA@mail.gmail.com>
	<6b1bfd4-7d3-ed49-eab1-045e790bfb6@mcs.anl.gov>
Message-ID: <4ff6c811-2823-79d8-1cc-cfe857e8a24c@mcs.anl.gov>

BTW: Here is a simple makefile for multiple sources.

>>>>
balay at sb /home/balay/tmp/prj/src
$ ls
main.cxx  makefile  sub.cxx
balay at sb /home/balay/tmp/prj/src
$ ls ../inc/
mainc.h
balay at sb /home/balay/tmp/prj/src
$ cat makefile 
all: main

CXXPPFLAGS       = -I../inc
LDLIBS           = -lmv

include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules
include ${PETSC_DIR}/lib/petsc/conf/test

main: sub.o
<<<<<

We don't have a simple makefile for the use case where src, obj, binaries are  in different locations.

gmakefile.test has some code for it [this requires replacing 'lib/petsc/conf/test' with custom compile targets - as in gmakefile.test].

[and what you have appears to work for this usecase.]

Satish

On Tue, 16 Feb 2021, Satish Balay via petsc-users wrote:

> for CXX - its CXXPPFLAGS
> 
> > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> 
> The link options shouldn't go into preprocessor flags.
> 
> Also duplicate blas with petsc might cause grief. Best to build petsc with mkl.
> 
> 
> Satish
> 
> On Tue, 16 Feb 2021, Matthew Knepley wrote:
> 
> > On Tue, Feb 16, 2021 at 9:39 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
> > wrote:
> > 
> > > Swarnava,
> > >
> > > Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may
> > > explicitly declare a %.o: %.cc target and force it to include your CPPFLAGS.
> > >
> > 
> > No, do not do either of these things. We just need to figure out why it is
> > not working for you. I will make a small example as soon as possible.
> > 
> >   Thanks,
> > 
> >      Matt
> > 
> > 
> > > Best regards,
> > >
> > > Jacob Faibussowitsch
> > > (Jacob Fai - booss - oh - vitch)
> > > Cell: (312) 694-3391
> > >
> > > On Feb 16, 2021, at 08:42, Matthew Knepley <knepley at gmail.com> wrote:
> > >
> > > On Mon, Feb 15, 2021 at 10:50 PM Barry Smith <bsmith at petsc.dev> wrote:
> > >
> > >> Swarnava,
> > >>
> > >> sddft.h is not a PETSc include file, nor is it used by PETSc so I think
> > >> the issue is not directly to PETSc it is related to where sddft is on the
> > >> machine and how it is found by your makefile.
> > >>
> > >
> > > Barry,
> > >
> > > His problem is that he is trying to put extra include flags on the compile
> > > line, but it is not working. I am wondering if his make is malfunctioning.
> > >
> > >   Thanks,
> > >
> > >      Matt
> > >
> > >
> > >>   Barry
> > >>
> > >>
> > >>
> > >> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh <swarnava89 at gmail.com>
> > >> wrote:
> > >> >
> > >> > Dear Petsc developers and users,
> > >> >
> > >> > I am having some issue with building my code with the following
> > >> makefile. I was earlier able to build this with the same makefile on a
> > >> different machine. Would you please help me out on this issue?
> > >> >
> > >> > Contents of makefile:
> > >> > ==============================================
> > >> > all:sparc
> > >> >
> > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> > >> >
> > >> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/
> > >> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/
> > >> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/
> > >> forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc
> > >> ./src/tools.cc
> > >> >
> > >> > SOURCEH = ./inc/sddft.h ./inc/isddft.h
> > >> >
> > >> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o
> > >> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
> > >> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
> > >> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o
> > >> >
> > >> > LIBBASE = ./lib/sparc
> > >> >
> > >> > CLEANFILES = ./lib/sparc
> > >> >
> > >> > include ${PETSC_DIR}/lib/petsc/conf/variables
> > >> > include ${PETSC_DIR}/lib/petsc/conf/rules
> > >> >
> > >> > sparc: ${OBJSC} chkopts
> > >> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> > >> > ${RM} $(SOURCECPP:%.cc=%.o)
> > >> >
> > >> > ===========================================
> > >> > Error:
> > >> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g
> > >>  -I/home/swarnava/petsc/include
> > >> -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> > >> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic
> > >> error: cannot open source file "sddft.h"
> > >> >   #include "sddft.h"
> > >> >                     ^
> > >> > ====================================================
> > >> >
> > >> > It's not able to see the header file though I have -I ./inc in
> > >> CPPFLAGS. The directory containing makefile has the directory "inc" with
> > >> the headers and "src" with the .cc files.
> > >> >
> > >> > Thank you,
> > >> > Swarnava
> > >> >
> > >> >
> > >>
> > >>
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which their
> > > experiments lead.
> > > -- Norbert Wiener
> > >
> > > https://www.cse.buffalo.edu/~knepley/
> > > <http://www.cse.buffalo.edu/~knepley/>
> > >
> > >
> > >
> > 
> > 
> 


From jroman at dsic.upv.es  Tue Feb 16 13:54:26 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Tue, 16 Feb 2021 20:54:26 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
	<119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
	<CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>
	<6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev>
	<46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es>
	<CAMYG4G=nN79akOKMXYS-YXesTj1pZq-br6C_4GA29Vm0zFAwGQ@mail.gmail.com>
	<80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es>
Message-ID: <BCA76786-A7C6-4282-A230-98ED85D653DC@dsic.upv.es>

Florian: I have created a MR https://gitlab.com/slepc/slepc/-/merge_requests/149
Let me know if it fits your needs.

Jose


> El 15 feb 2021, a las 18:44, Jose E. Roman <jroman at dsic.upv.es> escribi?:
> 
> 
> 
>> El 15 feb 2021, a las 14:53, Matthew Knepley <knepley at gmail.com> escribi?:
>> 
>> On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
>> I will think about the viability of adding an interface function to pass the preconditioner matrix.
>> 
>> Regarding the question about the B-orthogonality of computed vectors, in the symmetric solver the B-orthogonality is enforced during the computation, so you have guarantee that the computed vectors satisfy it. But if solved as non-symetric, the computed vectors may depart from B-orthogonality, unless the tolerance is very small.
>> 
>> Yes, the vectors I generate are not B-orthogonal.
>> 
>> Jose, do you think there is a way to reformulate what I am doing to use the symmetric solver, even if we only have the action of B?
> 
> Yes, you can do the following:
> 
>  ierr = EPSSetOperators(eps,S,NULL);CHKERRQ(ierr);   // S is your shell matrix A^{-1}*B
>  ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr);  // symmetric problem though S is not symmetric
>  ierr = EPSSetFromOptions(eps);CHKERRQ(ierr);
>  ierr = EPSSetUp(eps);CHKERRQ(ierr);   // note explicitly calling setup here
>  ierr = EPSGetBV(eps,&bv);CHKERRQ(ierr);
>  ierr = BVSetMatrix(bv,B,PETSC_FALSE);CHKERRQ(ierr);    // replace solver's inner product
>  ierr = EPSSolve(eps);CHKERRQ(ierr);
> 
> I have tried this with test1.c and it works. The computed eigenvectors should be B-orthogonal in this case.
> 
> Jose
> 
> 
>> 
>>  Thanks,
>> 
>>     Matt
>> 
>> Jose
>> 
>> 
>>> El 14 feb 2021, a las 21:41, Barry Smith <bsmith at petsc.dev> escribi?:
>>> 
>>> 
>>>  Florian,
>>> 
>>>   I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). 
>>> 
>>>   All I was saying is theoretically there could/should be such support in SLEPc.
>>> 
>>>  Barry
>>> 
>>> 
>>>> On Feb 13, 2021, at 6:43 PM, Florian Bruckner <e0425375 at gmail.com> wrote:
>>>> 
>>>> Dear Barry,
>>>> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator.
>>>> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator.
>>>> 
>>>> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly?
>>>> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator.
>>>> 
>>>> best wishes
>>>> Florian
>>>> 
>>>> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>> 
>>>> 
>>>>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
>>>>>> 
>>>>>> Dear Jose, Dear Barry,
>>>>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach?
>>>>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right?
>>>>>> 
>>>>>> Regarding Barry's advice this is what I first tried:
>>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>>>>> st = es.getST()
>>>>>> ksp = st.getKSP()
>>>>>> ksp.setOperators(self.A0, self.P0)
>>>>>> 
>>>>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right?
>>>>> 
>>>>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1.
>>>> 
>>>>  Just trying to provide more clarity with the terms.
>>>> 
>>>> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of  (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1
>>>> 
>>>> Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations".
>>>> 
>>>>  Barry
>>>> 
>>>> 
>>>>> 
>>>>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement?
>>>>> 
>>>>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though.
>>>>> 
>>>>> Thanks,
>>>>> Pierre
>>>>> 
>>>>>> best wishes
>>>>>> Florian
>>>>>> 
>>>>>> 
>>>>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
>>>>>>> 
>>>>>>> Dear Jose, Dear Matt, 
>>>>>>> 
>>>>>>> I needed some time to think about your answers. 
>>>>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
>>>>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
>>>>>>> Unfortunately this interface is not available, right?
>>>>>> 
>>>>>>   If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . 
>>>>>> 
>>>>>>  Barry
>>>>>> 
>>>>>>> 
>>>>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?
>>>>>>> 
>>>>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
>>>>>>> 
>>>>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
>>>>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
>>>>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?
>>>>>>> 
>>>>>>> thanks again,
>>>>>>> and best wishes
>>>>>>> Florian
>>>>>>> 
>>>>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
>>>>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
>>>>>>> 
>>>>>>>  (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
>>>>>>> 
>>>>>>>  A0^{-1}*B0*v=theta*v
>>>>>>> 
>>>>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
>>>>>>> 
>>>>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
>>>>>>> 
>>>>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
>>>>>>> 
>>>>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
>>>>>>> 
>>>>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
>>>>>>> 
>>>>>>> If you are using the git repo, I could add the relevant code.
>>>>>>> 
>>>>>>> Jose
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com> escribi?:
>>>>>>>> 
>>>>>>>> On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com> wrote:
>>>>>>>> Dear PETSc / SLEPc Users,
>>>>>>>> 
>>>>>>>> my question is very similar to the one posted here: 
>>>>>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
>>>>>>>> 
>>>>>>>> The eigensystem I would like to solve looks like:
>>>>>>>> B0 v = 1/omega A0 v
>>>>>>>> B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
>>>>>>>> 
>>>>>>>> I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
>>>>>>>> 
>>>>>>>>        es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
>>>>>>>>        st = es.getST()
>>>>>>>>        ksp = st.getKSP()
>>>>>>>>        ksp.setOperators(self.A0, self.P0)
>>>>>>>> 
>>>>>>>> Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
>>>>>>>> By the way, should P0 be an approximation of A0 or does it have to include B0?
>>>>>>>> 
>>>>>>>> Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
>>>>>>>> 
>>>>>>>> Jose can correct me if I say something wrong.
>>>>>>>> 
>>>>>>>> When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
>>>>>>>> then handed that to EPS. You can see me do it here:
>>>>>>>> 
>>>>>>>>  https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
>>>>>>>> 
>>>>>>>> I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
>>>>>>>> 
>>>>>>>>  Thanks,
>>>>>>>> 
>>>>>>>>     Matt
>>>>>>>> 
>>>>>>>> thanks for any advice
>>>>>>>> best wishes
>>>>>>>> Florian
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>> 
>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> <test.py>
>>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/


From swarnava89 at gmail.com  Tue Feb 16 18:31:13 2021
From: swarnava89 at gmail.com (Swarnava Ghosh)
Date: Tue, 16 Feb 2021 19:31:13 -0500
Subject: [petsc-users] makefile for building application with petsc
In-Reply-To: <4ff6c811-2823-79d8-1cc-cfe857e8a24c@mcs.anl.gov>
References: <CAC9YzR6mUzAUV7tiDUWCfcyMEgCJsoMFfbTUJrwr=BxzQdDLVg@mail.gmail.com>
	<BB87E0DC-5F29-4893-87BA-9FA9A1F9E19B@petsc.dev>
	<CAMYG4Gn1qp-r=XkJYSuACqgVMjHdvd-kQLR+8oxceUdAmTfOxA@mail.gmail.com>
	<AD464277-6A63-4B6C-B052-626E9F9A7F91@gmail.com>
	<CAMYG4Gknc8OgP2AGBWgSMWAPG-pkUPSnMQ45d3dSTsfs1T8NhA@mail.gmail.com>
	<6b1bfd4-7d3-ed49-eab1-045e790bfb6@mcs.anl.gov>
	<4ff6c811-2823-79d8-1cc-cfe857e8a24c@mcs.anl.gov>
Message-ID: <CAC9YzR5GYB_jLxCeuJtBqp78gvmxoTvPzck3NWsMiEppSq5skQ@mail.gmail.com>

Thank you for your responses. The CXXPPFLAGS as mentioned by Satish makes
it work.

Sincerely,
Swarnava

On Tue, Feb 16, 2021 at 1:07 PM Satish Balay via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> BTW: Here is a simple makefile for multiple sources.
>
> >>>>
> balay at sb /home/balay/tmp/prj/src
> $ ls
> main.cxx  makefile  sub.cxx
> balay at sb /home/balay/tmp/prj/src
> $ ls ../inc/
> mainc.h
> balay at sb /home/balay/tmp/prj/src
> $ cat makefile
> all: main
>
> CXXPPFLAGS       = -I../inc
> LDLIBS           = -lmv
>
> include ${PETSC_DIR}/lib/petsc/conf/variables
> include ${PETSC_DIR}/lib/petsc/conf/rules
> include ${PETSC_DIR}/lib/petsc/conf/test
>
> main: sub.o
> <<<<<
>
> We don't have a simple makefile for the use case where src, obj, binaries
> are  in different locations.
>
> gmakefile.test has some code for it [this requires replacing
> 'lib/petsc/conf/test' with custom compile targets - as in gmakefile.test].
>
> [and what you have appears to work for this usecase.]
>
> Satish
>
> On Tue, 16 Feb 2021, Satish Balay via petsc-users wrote:
>
> > for CXX - its CXXPPFLAGS
> >
> > > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> > > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
> -lpthread
> >
> > The link options shouldn't go into preprocessor flags.
> >
> > Also duplicate blas with petsc might cause grief. Best to build petsc
> with mkl.
> >
> >
> > Satish
> >
> > On Tue, 16 Feb 2021, Matthew Knepley wrote:
> >
> > > On Tue, Feb 16, 2021 at 9:39 AM Jacob Faibussowitsch <
> jacob.fai at gmail.com>
> > > wrote:
> > >
> > > > Swarnava,
> > > >
> > > > Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may
> > > > explicitly declare a %.o: %.cc target and force it to include your
> CPPFLAGS.
> > > >
> > >
> > > No, do not do either of these things. We just need to figure out why
> it is
> > > not working for you. I will make a small example as soon as possible.
> > >
> > >   Thanks,
> > >
> > >      Matt
> > >
> > >
> > > > Best regards,
> > > >
> > > > Jacob Faibussowitsch
> > > > (Jacob Fai - booss - oh - vitch)
> > > > Cell: (312) 694-3391
> > > >
> > > > On Feb 16, 2021, at 08:42, Matthew Knepley <knepley at gmail.com>
> wrote:
> > > >
> > > > On Mon, Feb 15, 2021 at 10:50 PM Barry Smith <bsmith at petsc.dev>
> wrote:
> > > >
> > > >> Swarnava,
> > > >>
> > > >> sddft.h is not a PETSc include file, nor is it used by PETSc so I
> think
> > > >> the issue is not directly to PETSc it is related to where sddft is
> on the
> > > >> machine and how it is found by your makefile.
> > > >>
> > > >
> > > > Barry,
> > > >
> > > > His problem is that he is trying to put extra include flags on the
> compile
> > > > line, but it is not working. I am wondering if his make is
> malfunctioning.
> > > >
> > > >   Thanks,
> > > >
> > > >      Matt
> > > >
> > > >
> > > >>   Barry
> > > >>
> > > >>
> > > >>
> > > >> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh <swarnava89 at gmail.com
> >
> > > >> wrote:
> > > >> >
> > > >> > Dear Petsc developers and users,
> > > >> >
> > > >> > I am having some issue with building my code with the following
> > > >> makefile. I was earlier able to build this with the same makefile
> on a
> > > >> different machine. Would you please help me out on this issue?
> > > >> >
> > > >> > Contents of makefile:
> > > >> > ==============================================
> > > >> > all:sparc
> > > >> >
> > > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/
> > > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
> -lpthread
> > > >> >
> > > >> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc
> ./src/
> > > >> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/
> > > >> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/
> > > >> forces.cc ./src/relaxatoms.cc ./src/multipole.cc
> ./src/electrostatics.cc
> > > >> ./src/tools.cc
> > > >> >
> > > >> > SOURCEH = ./inc/sddft.h ./inc/isddft.h
> > > >> >
> > > >> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o
> ./src/energy.o
> > > >> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o
> > > >> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o
> > > >> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o
> ./src/tools.o
> > > >> >
> > > >> > LIBBASE = ./lib/sparc
> > > >> >
> > > >> > CLEANFILES = ./lib/sparc
> > > >> >
> > > >> > include ${PETSC_DIR}/lib/petsc/conf/variables
> > > >> > include ${PETSC_DIR}/lib/petsc/conf/rules
> > > >> >
> > > >> > sparc: ${OBJSC} chkopts
> > > >> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB}
> > > >> > ${RM} $(SOURCECPP:%.cc=%.o)
> > > >> >
> > > >> > ===========================================
> > > >> > Error:
> > > >> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c
> -g
> > > >>  -I/home/swarnava/petsc/include
> > > >> -I/home/swarnava/petsc/linux-gnu-intel/include    `pwd`/src/main.cc
> > > >> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic
> > > >> error: cannot open source file "sddft.h"
> > > >> >   #include "sddft.h"
> > > >> >                     ^
> > > >> > ====================================================
> > > >> >
> > > >> > It's not able to see the header file though I have -I ./inc in
> > > >> CPPFLAGS. The directory containing makefile has the directory "inc"
> with
> > > >> the headers and "src" with the .cc files.
> > > >> >
> > > >> > Thank you,
> > > >> > Swarnava
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >
> > > > --
> > > > What most experimenters take for granted before they begin their
> > > > experiments is infinitely more interesting than any results to which
> their
> > > > experiments lead.
> > > > -- Norbert Wiener
> > > >
> > > > https://www.cse.buffalo.edu/~knepley/
> > > > <http://www.cse.buffalo.edu/~knepley/>
> > > >
> > > >
> > > >
> > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210216/f4cc0d92/attachment-0001.html>

From roland.richter at ntnu.no  Wed Feb 17 04:11:58 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Wed, 17 Feb 2021 11:11:58 +0100
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
Message-ID: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>

Hei,

when compiling the attached files using the following compilation line

//usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB
-DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK
-DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK
-DUSE_CUDA
-I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include
-I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include
-I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem
/opt/fftw3/include -isystem /opt/boost/include -march=native
-fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2
-funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native
-fopenmp -std=gnu++17 -c <source_files> -o <target_files>/

and linking them with?

//usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd
-DMKL_LP64 -m64 <target_files> -o bin/armadillo_with_PETSc?
-Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
/usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
/opt/intel/mkl/lib/intel64/libmkl_rt.so
/opt/boost/lib/libboost_filesystem.so.1.72.0
/opt/boost/lib/libboost_mpi.so.1.72.0
/opt/boost/lib/libboost_program_options.so.1.72.0
/opt/boost/lib/libboost_serialization.so.1.72.0
/opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
/opt/petsc_release/lib/libpetsc.so
/usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/

my output is?

/Arma and PETSc/MatScale are equal:?????????????????????????????????????
??? ??? 0//
//Arma-time for a matrix size of [1024,
8192]:??????????????????????????? ??? ????? 24955//
//PETSc-time, pointer for a matrix size of [1024,
8192]:????????????????? ??? 28283//
//PETSc-time, MatScale for a matrix size of [1024,
8192]:????????????????? 23138/

but when removing the explicit call to openmp (i.e. removing /-fopenmp/
and //usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/) my result is

/Arma and PETSc/MatScale are equal:?????????????????????????????????????
?????? 1//
//Arma-time for a matrix size of [1024,
8192]:??????????????????????????? ??? ???? 24878//
//PETSc-time, pointer for a matrix size of [1024,
8192]:???????????????????? 18942//
//PETSc-time, MatScale for a matrix size of [1024,
8192]:???????????????? 23350/

even though both times the executable is linked to

/??????? libmkl_intel_lp64.so =>
/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
(0x00007f9eebd70000)//
//??????? libmkl_core.so =>
/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_core.so
(0x00007f9ee77aa000)//
//??????? libmkl_intel_thread.so =>
/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
(0x00007f9ee42c3000)//
//??????? libiomp5.so =>
/opt/intel/compilers_and_libraries_2020.2.254/linux/compiler/lib/intel64_lin/libiomp5.so
(0x00007f9ee3ebd000)//
//??????? libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f9ea98bd000)/

via the petsc-library. Why does the execution time vary by so much, and
why does my result change when calling MatScale (i.e. returning wrong
results) when explicitly linking to OpenMP?

Thanks!

Regards,

Roland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/27fdd42a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main_short.cpp
Type: text/x-c++src
Size: 287 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/27fdd42a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_scaling.cpp
Type: text/x-c++src
Size: 4144 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/27fdd42a/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: helper_functions.cpp
Type: text/x-c++src
Size: 2319 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/27fdd42a/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_scaling.hpp
Type: text/x-c++hdr
Size: 576 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/27fdd42a/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: helper_functions.hpp
Type: text/x-c++hdr
Size: 646 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/27fdd42a/attachment-0004.bin>

From jed at jedbrown.org  Wed Feb 17 10:10:37 2021
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 17 Feb 2021 09:10:37 -0700
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
Message-ID: <874kiad69u.fsf@jedbrown.org>

You're using an MKL linked to Intel's OpenMP. I could imagine there being symbol conflicts causing MKL to compute wrong results if libgomp symbols are picked up.

Note that -fopenmp-simd does not require linking -- it just gives the compiler hints about how to vectorize. So you can probably keep using it and just stop passing libgomp.so. Alternatively, you can link MKL to work with libgomp (see the MKL link advisor).

Roland Richter <roland.richter at ntnu.no> writes:

> Hei,
>
> when compiling the attached files using the following compilation line
>
> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB
> -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK
> -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK
> -DUSE_CUDA
> -I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include
> -I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include
> -I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem
> /opt/fftw3/include -isystem /opt/boost/include -march=native
> -fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2
> -funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native
> -fopenmp -std=gnu++17 -c <source_files> -o <target_files>/
>
> and linking them with?
>
> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd
> -DMKL_LP64 -m64 <target_files> -o bin/armadillo_with_PETSc?
> -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
> /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
> /opt/intel/mkl/lib/intel64/libmkl_rt.so
> /opt/boost/lib/libboost_filesystem.so.1.72.0
> /opt/boost/lib/libboost_mpi.so.1.72.0
> /opt/boost/lib/libboost_program_options.so.1.72.0
> /opt/boost/lib/libboost_serialization.so.1.72.0
> /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
> /opt/petsc_release/lib/libpetsc.so
> /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/
>
> my output is?
>
> /Arma and PETSc/MatScale are equal:?????????????????????????????????????
> ??? ??? 0//
> //Arma-time for a matrix size of [1024,
> 8192]:??????????????????????????? ??? ????? 24955//
> //PETSc-time, pointer for a matrix size of [1024,
> 8192]:????????????????? ??? 28283//
> //PETSc-time, MatScale for a matrix size of [1024,
> 8192]:????????????????? 23138/
>
> but when removing the explicit call to openmp (i.e. removing /-fopenmp/
> and //usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/) my result is
>
> /Arma and PETSc/MatScale are equal:?????????????????????????????????????
> ?????? 1//
> //Arma-time for a matrix size of [1024,
> 8192]:??????????????????????????? ??? ???? 24878//
> //PETSc-time, pointer for a matrix size of [1024,
> 8192]:???????????????????? 18942//
> //PETSc-time, MatScale for a matrix size of [1024,
> 8192]:???????????????? 23350/
>
> even though both times the executable is linked to
>
> /??????? libmkl_intel_lp64.so =>
> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
> (0x00007f9eebd70000)//
> //??????? libmkl_core.so =>
> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_core.so
> (0x00007f9ee77aa000)//
> //??????? libmkl_intel_thread.so =>
> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
> (0x00007f9ee42c3000)//
> //??????? libiomp5.so =>
> /opt/intel/compilers_and_libraries_2020.2.254/linux/compiler/lib/intel64_lin/libiomp5.so
> (0x00007f9ee3ebd000)//
> //??????? libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f9ea98bd000)/
>
> via the petsc-library. Why does the execution time vary by so much, and
> why does my result change when calling MatScale (i.e. returning wrong
> results) when explicitly linking to OpenMP?
>
> Thanks!
>
> Regards,
>
> Roland
>
> #include <test_scaling.hpp>
>
> int main(int argc, char **args) {
> 	PetscMPIInt rank, size;
> 	PetscInitialize(&argc, &args, (char*) 0, NULL);
>
> 	MPI_Comm_size(PETSC_COMM_WORLD, &size);
> 	MPI_Comm_rank(PETSC_COMM_WORLD, &rank);
>
> 	test_scaling (1024, 8192, false);
> 	PetscFinalize();
> 	return 0;
> }
> #include <test_scaling.hpp>
>
> void test_scaling_arma(const arma::cx_mat & in_mat,
> 					   arma::cx_mat &out_mat,
> 					   const arma::cx_double &scaling_factor) {
> 	out_mat = in_mat;
> 	out_mat *= scaling_factor;
> }
>
> void test_scaling_petsc(const Mat &in_mat,
> 						Mat &out_mat,
> 						const PetscScalar &scaling_factor) {
> 	MatZeroEntries(out_mat);
> 	MatAXPY(out_mat, scaling_factor, in_mat, SAME_NONZERO_PATTERN);
> }
>
> void test_scaling_petsc_pointer(const Mat &in_mat,
> 								Mat &out_mat,
> 								const PetscScalar &scaling_factor) {
> 	const PetscScalar *in_mat_ptr;
> 	PetscScalar *out_mat_ptr;
> 	MatDenseGetArrayRead (in_mat, &in_mat_ptr);
> 	MatDenseGetArrayWrite (out_mat, &out_mat_ptr);
> 	PetscInt r_0, r_1;
> 	MatGetLocalSize (out_mat, &r_0, &r_1);
> #pragma omp parallel for
> 	for(int i = 0; i < r_0 * r_1; ++i)
> 		*(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);
>
> 	MatDenseRestoreArrayRead (in_mat, &in_mat_ptr);
> 	MatDenseRestoreArrayWrite(out_mat, &out_mat_ptr);
> }
>
> void test_scaling(const size_t matrix_size_rows, const size_t matrix_size_cols, const bool print_matrices) {
> 	PetscMPIInt rank, size;
>
> 	MPI_Comm_size(PETSC_COMM_WORLD, &size);
> 	MPI_Comm_rank(PETSC_COMM_WORLD, &rank);
>
> 	arma::cx_mat in_mat = arma::zeros<arma::cx_mat>(matrix_size_rows, matrix_size_cols),
> 			out_mat = arma::zeros<arma::cx_mat>(matrix_size_rows, matrix_size_cols);
> 	arma::cx_rowvec matrix_vec = arma::conv_to<arma::cx_rowvec>::from(arma::linspace<arma::rowvec>(0, matrix_size_cols - 1, matrix_size_cols));
> 	in_mat.each_row([&](arma::cx_rowvec &a){
> 		a = matrix_vec;
> 	});
>
> 	Mat petsc_in_mat, petsc_out_mat;
> 	arma::cx_mat petsc_out_comparison_mat = arma::zeros<arma::cx_mat>(matrix_size_rows, matrix_size_cols);
> 	MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, matrix_size_rows, matrix_size_cols, NULL, &petsc_in_mat);
> 	MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, matrix_size_rows, matrix_size_cols, NULL, &petsc_out_mat);
>
> 	create_data_in_PETSc_from_scratch (in_mat, petsc_in_mat);
> 	MatZeroEntries(petsc_out_mat);
>
> 	const std::complex<double> scaling_factor{1. / matrix_size_cols, 0.};
>
> 	test_scaling_arma (in_mat, out_mat, scaling_factor);
> 	test_scaling_petsc (petsc_in_mat, petsc_out_mat, scaling_factor);
>
> 	//Benchmark
> 	auto t1 = std::chrono::high_resolution_clock::now();
> 	for(size_t i = 0; i < bench_rounds; ++i) {
> 		test_scaling_arma (in_mat, out_mat, scaling_factor);
> 	}
> 	auto t2 = std::chrono::high_resolution_clock::now();
>
> 	auto t3 = std::chrono::high_resolution_clock::now();
> 	for(size_t i = 0; i < bench_rounds; ++i) {
> 		test_scaling_petsc_pointer (petsc_in_mat, petsc_out_mat, scaling_factor);
> 	}
> 	auto t4 = std::chrono::high_resolution_clock::now();
>
> 	auto t5 = std::chrono::high_resolution_clock::now();
> 	for(size_t i = 0; i < bench_rounds; ++i) {
> 		test_scaling_petsc (petsc_in_mat, petsc_out_mat, scaling_factor);
> 	}
> 	auto t6 = std::chrono::high_resolution_clock::now();
>
>
> 	retrieve_data_from_PETSc (petsc_out_mat, petsc_out_comparison_mat, matrix_size_cols, matrix_size_rows);
>
> 	if(print_matrices && rank == 0) {
> 		std::cout << "In-matrix, ARMA:\n" << in_mat
> 				  << "\n\nOut-matrix, ARMA:\n" << out_mat
> 				  << "\n\nComparison-out-matrix, ARMA:\n" << petsc_out_comparison_mat
> 				  << "\n\nDifference: \n" << arma::abs(petsc_out_comparison_mat - out_mat)
> 				  <<'\n';
> 	}
> 	if(rank == 0) {
> 		std::cout << "Arma and PETSc/MatScale are equal:\t\t\t\t\t" << arma::approx_equal(out_mat, petsc_out_comparison_mat, "reldiff", 1e-8) << '\n';
> 		std::cout << "Arma-time for a matrix size of ["
> << matrix_size_rows << ", "
> << matrix_size_cols << "]:\t\t\t\t"
> << std::chrono::duration_cast<std::chrono::milliseconds>( t2 - t1 ).count() << '\n';
> 		std::cout << "PETSc-time, pointer for a matrix size of ["
> << matrix_size_rows << ", "
> << matrix_size_cols << "]:\t\t\t"
> << std::chrono::duration_cast<std::chrono::milliseconds>( t4 - t3 ).count() << '\n';
> 		std::cout << "PETSc-time, MatScale for a matrix size of ["
> << matrix_size_rows << ", "
> << matrix_size_cols << "]:\t\t\t"
> << std::chrono::duration_cast<std::chrono::milliseconds>( t6 - t5 ).count() << '\n';
> 	}
> 	MatDestroy (&petsc_in_mat);
> 	MatDestroy (&petsc_out_mat);
> }
> #include <helper_functions.hpp>
>
> void retrieve_data_from_PETSc(const Mat petsc_mat, arma::cx_mat &out_data,
> 							  const arma::uword Ntime, const arma::uword Nradius) {
> 	PetscMPIInt size;
> 	MPI_Comm_size(PETSC_COMM_WORLD, &size);
> 	if(out_data.n_rows != Ntime && out_data.n_cols != Nradius) {
> 		out_data = arma::zeros<arma::cx_mat>(Ntime, Nradius);
> 	}
> 	Mat local_mat;
> 	arma::Col<int> vector_indices_radius = arma::linspace<arma::Col<int>>(0, Nradius - 1, Nradius);
> 	arma::Col<int> vector_indices_time = arma::linspace<arma::Col<int>>(0, Ntime - 1, Ntime);
> 	//MatCreateRedundantMatrix(petsc_mat, Ntime * Nradius, MPI_COMM_NULL, MAT_INITIAL_MATRIX, &local_mat);
> 	MatCreateRedundantMatrix(petsc_mat, size, MPI_COMM_NULL, MAT_INITIAL_MATRIX, &local_mat);
> 	MatAssemblyBegin(local_mat, MAT_FINAL_ASSEMBLY);
> 	MatAssemblyEnd(local_mat, MAT_FINAL_ASSEMBLY);
> 	MatGetValues(local_mat, Nradius, vector_indices_radius.memptr(), Ntime, vector_indices_time.memptr(), out_data.memptr());
> 	MatDestroy(&local_mat);
> 	out_data = out_data.st();
> }
>
> void store_data_in_PETSc(const arma::cx_mat &in_data, Mat &petsc_mat) {
> 	const arma::uword Ntime = in_data.n_cols;
> 	const arma::uword Nradius = in_data.n_rows;
> 	arma::Col<int> vector_indices_radius = arma::linspace<arma::Col<int>>(0, Nradius - 1, Nradius);
> 	arma::Col<int> vector_indices_time = arma::linspace<arma::Col<int>>(0, Ntime - 1, Ntime);
> 	MatZeroEntries(petsc_mat);
> 	MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY);
> 	MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY);
> 	arma::cx_mat local_mat = in_data.st();
> 	MatSetValues(petsc_mat, Nradius, vector_indices_radius.memptr(), Ntime, vector_indices_time.memptr(), local_mat.memptr(), INSERT_VALUES);
> 	MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY);
> 	MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY);
> }
>
> void create_data_in_PETSc_from_scratch(const arma::cx_mat &in_data, Mat &petsc_mat) {
> 	const arma::uword Ntime = in_data.n_cols;
> 	const arma::uword Nradius = in_data.n_rows;
> 	MatZeroEntries(petsc_mat);
> 	MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY);
> 	MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY);
> 	for(int i = 0; i < (int)Ntime; ++i){
> 		for(int j = 0; j < (int)Nradius; ++j) {
> 			MatSetValue(petsc_mat, j, i, i, INSERT_VALUES);
> 		}
> 	}
> 	MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY);
> 	MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY);
> }
> #ifndef TEST_SCALING_HPP
> #define TEST_SCALING_HPP
>
> #include <helper_functions.hpp>
>
> void test_scaling_arma(const arma::cx_mat & in_mat,
> 					   arma::cx_mat &out_mat,
> 					   const arma::cx_double &scaling_factor);
>
> void test_scaling_petsc(const Mat &in_mat,
> 						Mat &out_mat,
> 						const PetscScalar &scaling_factor);
>
> void test_scaling_petsc_pointer(const Mat &in_mat,
> 								Mat &out_mat,
> 								const PetscScalar &scaling_factor);
>
> void test_scaling(const size_t matrix_size_rows, const size_t matrix_size_cols, const bool print_matrices);
>
> #endif // TEST_SCALING_HPP
> #ifndef HELPER_FUNCTIONS_HPP
> #define HELPER_FUNCTIONS_HPP
>
> #include <iostream>
>
> #include <armadillo>
> #include <petscts.h>
> #include <petscmat.h>
> #include <petscvec.h>
> #include <fftw3.h>
> #include <fftw3-mpi.h>
> #include <metis.h>
> #include <chrono>
>
> #include <boost/numeric/odeint.hpp>
>
> constexpr int bench_rounds = 1000;
>
> void retrieve_data_from_PETSc(const Mat petsc_mat, arma::cx_mat &out_data,
> 							  const arma::uword Ntime, const arma::uword Nradius);
>
> void store_data_in_PETSc(const arma::cx_mat &in_data, Mat &petsc_mat);
>
> void create_data_in_PETSc_from_scratch(const arma::cx_mat &in_data, Mat &petsc_mat);
>
> #endif // HELPER_FUNCTIONS_HPP

From roland.richter at ntnu.no  Wed Feb 17 10:23:48 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Wed, 17 Feb 2021 17:23:48 +0100
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <874kiad69u.fsf@jedbrown.org>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
Message-ID: <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>

Hei,

I replaced the linking line with

//usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd
-DMKL_LP64 -m64
CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
bin/armadillo_with_PETSc?
-Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
/usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran?
-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
-lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
/opt/boost/lib/libboost_filesystem.so.1.72.0
/opt/boost/lib/libboost_mpi.so.1.72.0
/opt/boost/lib/libboost_program_options.so.1.72.0
/opt/boost/lib/libboost_serialization.so.1.72.0
/opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
/opt/petsc_release/lib/libpetsc.so
/usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
/

and now the results are correct. Nevertheless, when comparing the loop
in line 26-28 in file test_scaling.cpp

/#pragma omp parallel for//
//??? for(int i = 0; i < r_0 * r_1; ++i)//
//??? ??? *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/

the version without /#pragma omp parallel/ for is significantly faster
(i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
still such a big difference?

Thanks!

Am 17.02.21 um 17:10 schrieb Jed Brown:
> You're using an MKL linked to Intel's OpenMP. I could imagine there being symbol conflicts causing MKL to compute wrong results if libgomp symbols are picked up.
>
> Note that -fopenmp-simd does not require linking -- it just gives the compiler hints about how to vectorize. So you can probably keep using it and just stop passing libgomp.so. Alternatively, you can link MKL to work with libgomp (see the MKL link advisor).
>
> Roland Richter <roland.richter at ntnu.no> writes:
>
>> Hei,
>>
>> when compiling the attached files using the following compilation line
>>
>> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB
>> -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK
>> -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK
>> -DUSE_CUDA
>> -I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include
>> -I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include
>> -I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem
>> /opt/fftw3/include -isystem /opt/boost/include -march=native
>> -fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2
>> -funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native
>> -fopenmp -std=gnu++17 -c <source_files> -o <target_files>/
>>
>> and linking them with?
>>
>> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd
>> -DMKL_LP64 -m64 <target_files> -o bin/armadillo_with_PETSc?
>> -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
>> /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
>> /opt/intel/mkl/lib/intel64/libmkl_rt.so
>> /opt/boost/lib/libboost_filesystem.so.1.72.0
>> /opt/boost/lib/libboost_mpi.so.1.72.0
>> /opt/boost/lib/libboost_program_options.so.1.72.0
>> /opt/boost/lib/libboost_serialization.so.1.72.0
>> /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
>> /opt/petsc_release/lib/libpetsc.so
>> /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/
>>
>> my output is?
>>
>> /Arma and PETSc/MatScale are equal:?????????????????????????????????????
>> ??? ??? 0//
>> //Arma-time for a matrix size of [1024,
>> 8192]:??????????????????????????? ??? ????? 24955//
>> //PETSc-time, pointer for a matrix size of [1024,
>> 8192]:????????????????? ??? 28283//
>> //PETSc-time, MatScale for a matrix size of [1024,
>> 8192]:????????????????? 23138/
>>
>> but when removing the explicit call to openmp (i.e. removing /-fopenmp/
>> and //usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/) my result is
>>
>> /Arma and PETSc/MatScale are equal:?????????????????????????????????????
>> ?????? 1//
>> //Arma-time for a matrix size of [1024,
>> 8192]:??????????????????????????? ??? ???? 24878//
>> //PETSc-time, pointer for a matrix size of [1024,
>> 8192]:???????????????????? 18942//
>> //PETSc-time, MatScale for a matrix size of [1024,
>> 8192]:???????????????? 23350/
>>
>> even though both times the executable is linked to
>>
>> /??????? libmkl_intel_lp64.so =>
>> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so
>> (0x00007f9eebd70000)//
>> //??????? libmkl_core.so =>
>> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_core.so
>> (0x00007f9ee77aa000)//
>> //??????? libmkl_intel_thread.so =>
>> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so
>> (0x00007f9ee42c3000)//
>> //??????? libiomp5.so =>
>> /opt/intel/compilers_and_libraries_2020.2.254/linux/compiler/lib/intel64_lin/libiomp5.so
>> (0x00007f9ee3ebd000)//
>> //??????? libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f9ea98bd000)/
>>
>> via the petsc-library. Why does the execution time vary by so much, and
>> why does my result change when calling MatScale (i.e. returning wrong
>> results) when explicitly linking to OpenMP?
>>
>> Thanks!
>>
>> Regards,
>>
>> Roland
>>
>> #include <test_scaling.hpp>
>>
>> int main(int argc, char **args) {
>> 	PetscMPIInt rank, size;
>> 	PetscInitialize(&argc, &args, (char*) 0, NULL);
>>
>> 	MPI_Comm_size(PETSC_COMM_WORLD, &size);
>> 	MPI_Comm_rank(PETSC_COMM_WORLD, &rank);
>>
>> 	test_scaling (1024, 8192, false);
>> 	PetscFinalize();
>> 	return 0;
>> }
>> #include <test_scaling.hpp>
>>
>> void test_scaling_arma(const arma::cx_mat & in_mat,
>> 					   arma::cx_mat &out_mat,
>> 					   const arma::cx_double &scaling_factor) {
>> 	out_mat = in_mat;
>> 	out_mat *= scaling_factor;
>> }
>>
>> void test_scaling_petsc(const Mat &in_mat,
>> 						Mat &out_mat,
>> 						const PetscScalar &scaling_factor) {
>> 	MatZeroEntries(out_mat);
>> 	MatAXPY(out_mat, scaling_factor, in_mat, SAME_NONZERO_PATTERN);
>> }
>>
>> void test_scaling_petsc_pointer(const Mat &in_mat,
>> 								Mat &out_mat,
>> 								const PetscScalar &scaling_factor) {
>> 	const PetscScalar *in_mat_ptr;
>> 	PetscScalar *out_mat_ptr;
>> 	MatDenseGetArrayRead (in_mat, &in_mat_ptr);
>> 	MatDenseGetArrayWrite (out_mat, &out_mat_ptr);
>> 	PetscInt r_0, r_1;
>> 	MatGetLocalSize (out_mat, &r_0, &r_1);
>> #pragma omp parallel for
>> 	for(int i = 0; i < r_0 * r_1; ++i)
>> 		*(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);
>>
>> 	MatDenseRestoreArrayRead (in_mat, &in_mat_ptr);
>> 	MatDenseRestoreArrayWrite(out_mat, &out_mat_ptr);
>> }
>>
>> void test_scaling(const size_t matrix_size_rows, const size_t matrix_size_cols, const bool print_matrices) {
>> 	PetscMPIInt rank, size;
>>
>> 	MPI_Comm_size(PETSC_COMM_WORLD, &size);
>> 	MPI_Comm_rank(PETSC_COMM_WORLD, &rank);
>>
>> 	arma::cx_mat in_mat = arma::zeros<arma::cx_mat>(matrix_size_rows, matrix_size_cols),
>> 			out_mat = arma::zeros<arma::cx_mat>(matrix_size_rows, matrix_size_cols);
>> 	arma::cx_rowvec matrix_vec = arma::conv_to<arma::cx_rowvec>::from(arma::linspace<arma::rowvec>(0, matrix_size_cols - 1, matrix_size_cols));
>> 	in_mat.each_row([&](arma::cx_rowvec &a){
>> 		a = matrix_vec;
>> 	});
>>
>> 	Mat petsc_in_mat, petsc_out_mat;
>> 	arma::cx_mat petsc_out_comparison_mat = arma::zeros<arma::cx_mat>(matrix_size_rows, matrix_size_cols);
>> 	MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, matrix_size_rows, matrix_size_cols, NULL, &petsc_in_mat);
>> 	MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, matrix_size_rows, matrix_size_cols, NULL, &petsc_out_mat);
>>
>> 	create_data_in_PETSc_from_scratch (in_mat, petsc_in_mat);
>> 	MatZeroEntries(petsc_out_mat);
>>
>> 	const std::complex<double> scaling_factor{1. / matrix_size_cols, 0.};
>>
>> 	test_scaling_arma (in_mat, out_mat, scaling_factor);
>> 	test_scaling_petsc (petsc_in_mat, petsc_out_mat, scaling_factor);
>>
>> 	//Benchmark
>> 	auto t1 = std::chrono::high_resolution_clock::now();
>> 	for(size_t i = 0; i < bench_rounds; ++i) {
>> 		test_scaling_arma (in_mat, out_mat, scaling_factor);
>> 	}
>> 	auto t2 = std::chrono::high_resolution_clock::now();
>>
>> 	auto t3 = std::chrono::high_resolution_clock::now();
>> 	for(size_t i = 0; i < bench_rounds; ++i) {
>> 		test_scaling_petsc_pointer (petsc_in_mat, petsc_out_mat, scaling_factor);
>> 	}
>> 	auto t4 = std::chrono::high_resolution_clock::now();
>>
>> 	auto t5 = std::chrono::high_resolution_clock::now();
>> 	for(size_t i = 0; i < bench_rounds; ++i) {
>> 		test_scaling_petsc (petsc_in_mat, petsc_out_mat, scaling_factor);
>> 	}
>> 	auto t6 = std::chrono::high_resolution_clock::now();
>>
>>
>> 	retrieve_data_from_PETSc (petsc_out_mat, petsc_out_comparison_mat, matrix_size_cols, matrix_size_rows);
>>
>> 	if(print_matrices && rank == 0) {
>> 		std::cout << "In-matrix, ARMA:\n" << in_mat
>> 				  << "\n\nOut-matrix, ARMA:\n" << out_mat
>> 				  << "\n\nComparison-out-matrix, ARMA:\n" << petsc_out_comparison_mat
>> 				  << "\n\nDifference: \n" << arma::abs(petsc_out_comparison_mat - out_mat)
>> 				  <<'\n';
>> 	}
>> 	if(rank == 0) {
>> 		std::cout << "Arma and PETSc/MatScale are equal:\t\t\t\t\t" << arma::approx_equal(out_mat, petsc_out_comparison_mat, "reldiff", 1e-8) << '\n';
>> 		std::cout << "Arma-time for a matrix size of ["
>> << matrix_size_rows << ", "
>> << matrix_size_cols << "]:\t\t\t\t"
>> << std::chrono::duration_cast<std::chrono::milliseconds>( t2 - t1 ).count() << '\n';
>> 		std::cout << "PETSc-time, pointer for a matrix size of ["
>> << matrix_size_rows << ", "
>> << matrix_size_cols << "]:\t\t\t"
>> << std::chrono::duration_cast<std::chrono::milliseconds>( t4 - t3 ).count() << '\n';
>> 		std::cout << "PETSc-time, MatScale for a matrix size of ["
>> << matrix_size_rows << ", "
>> << matrix_size_cols << "]:\t\t\t"
>> << std::chrono::duration_cast<std::chrono::milliseconds>( t6 - t5 ).count() << '\n';
>> 	}
>> 	MatDestroy (&petsc_in_mat);
>> 	MatDestroy (&petsc_out_mat);
>> }
>> #include <helper_functions.hpp>
>>
>> void retrieve_data_from_PETSc(const Mat petsc_mat, arma::cx_mat &out_data,
>> 							  const arma::uword Ntime, const arma::uword Nradius) {
>> 	PetscMPIInt size;
>> 	MPI_Comm_size(PETSC_COMM_WORLD, &size);
>> 	if(out_data.n_rows != Ntime && out_data.n_cols != Nradius) {
>> 		out_data = arma::zeros<arma::cx_mat>(Ntime, Nradius);
>> 	}
>> 	Mat local_mat;
>> 	arma::Col<int> vector_indices_radius = arma::linspace<arma::Col<int>>(0, Nradius - 1, Nradius);
>> 	arma::Col<int> vector_indices_time = arma::linspace<arma::Col<int>>(0, Ntime - 1, Ntime);
>> 	//MatCreateRedundantMatrix(petsc_mat, Ntime * Nradius, MPI_COMM_NULL, MAT_INITIAL_MATRIX, &local_mat);
>> 	MatCreateRedundantMatrix(petsc_mat, size, MPI_COMM_NULL, MAT_INITIAL_MATRIX, &local_mat);
>> 	MatAssemblyBegin(local_mat, MAT_FINAL_ASSEMBLY);
>> 	MatAssemblyEnd(local_mat, MAT_FINAL_ASSEMBLY);
>> 	MatGetValues(local_mat, Nradius, vector_indices_radius.memptr(), Ntime, vector_indices_time.memptr(), out_data.memptr());
>> 	MatDestroy(&local_mat);
>> 	out_data = out_data.st();
>> }
>>
>> void store_data_in_PETSc(const arma::cx_mat &in_data, Mat &petsc_mat) {
>> 	const arma::uword Ntime = in_data.n_cols;
>> 	const arma::uword Nradius = in_data.n_rows;
>> 	arma::Col<int> vector_indices_radius = arma::linspace<arma::Col<int>>(0, Nradius - 1, Nradius);
>> 	arma::Col<int> vector_indices_time = arma::linspace<arma::Col<int>>(0, Ntime - 1, Ntime);
>> 	MatZeroEntries(petsc_mat);
>> 	MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY);
>> 	MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY);
>> 	arma::cx_mat local_mat = in_data.st();
>> 	MatSetValues(petsc_mat, Nradius, vector_indices_radius.memptr(), Ntime, vector_indices_time.memptr(), local_mat.memptr(), INSERT_VALUES);
>> 	MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY);
>> 	MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY);
>> }
>>
>> void create_data_in_PETSc_from_scratch(const arma::cx_mat &in_data, Mat &petsc_mat) {
>> 	const arma::uword Ntime = in_data.n_cols;
>> 	const arma::uword Nradius = in_data.n_rows;
>> 	MatZeroEntries(petsc_mat);
>> 	MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY);
>> 	MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY);
>> 	for(int i = 0; i < (int)Ntime; ++i){
>> 		for(int j = 0; j < (int)Nradius; ++j) {
>> 			MatSetValue(petsc_mat, j, i, i, INSERT_VALUES);
>> 		}
>> 	}
>> 	MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY);
>> 	MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY);
>> }
>> #ifndef TEST_SCALING_HPP
>> #define TEST_SCALING_HPP
>>
>> #include <helper_functions.hpp>
>>
>> void test_scaling_arma(const arma::cx_mat & in_mat,
>> 					   arma::cx_mat &out_mat,
>> 					   const arma::cx_double &scaling_factor);
>>
>> void test_scaling_petsc(const Mat &in_mat,
>> 						Mat &out_mat,
>> 						const PetscScalar &scaling_factor);
>>
>> void test_scaling_petsc_pointer(const Mat &in_mat,
>> 								Mat &out_mat,
>> 								const PetscScalar &scaling_factor);
>>
>> void test_scaling(const size_t matrix_size_rows, const size_t matrix_size_cols, const bool print_matrices);
>>
>> #endif // TEST_SCALING_HPP
>> #ifndef HELPER_FUNCTIONS_HPP
>> #define HELPER_FUNCTIONS_HPP
>>
>> #include <iostream>
>>
>> #include <armadillo>
>> #include <petscts.h>
>> #include <petscmat.h>
>> #include <petscvec.h>
>> #include <fftw3.h>
>> #include <fftw3-mpi.h>
>> #include <metis.h>
>> #include <chrono>
>>
>> #include <boost/numeric/odeint.hpp>
>>
>> constexpr int bench_rounds = 1000;
>>
>> void retrieve_data_from_PETSc(const Mat petsc_mat, arma::cx_mat &out_data,
>> 							  const arma::uword Ntime, const arma::uword Nradius);
>>
>> void store_data_in_PETSc(const arma::cx_mat &in_data, Mat &petsc_mat);
>>
>> void create_data_in_PETSc_from_scratch(const arma::cx_mat &in_data, Mat &petsc_mat);
>>
>> #endif // HELPER_FUNCTIONS_HPP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/7f774a7b/attachment.html>

From jed at jedbrown.org  Wed Feb 17 10:49:49 2021
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 17 Feb 2021 09:49:49 -0700
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
Message-ID: <87y2fmbpw2.fsf@jedbrown.org>

Roland Richter <roland.richter at ntnu.no> writes:

> Hei,
>
> I replaced the linking line with
>
> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd
> -DMKL_LP64 -m64
> CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
> bin/armadillo_with_PETSc?
> -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
> /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran?
> -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
> -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
> /opt/boost/lib/libboost_filesystem.so.1.72.0
> /opt/boost/lib/libboost_mpi.so.1.72.0
> /opt/boost/lib/libboost_program_options.so.1.72.0
> /opt/boost/lib/libboost_serialization.so.1.72.0
> /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
> /opt/petsc_release/lib/libpetsc.so
> /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
> /
>
> and now the results are correct. Nevertheless, when comparing the loop
> in line 26-28 in file test_scaling.cpp
>
> /#pragma omp parallel for//
> //??? for(int i = 0; i < r_0 * r_1; ++i)//
> //??? ??? *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/
>
> the version without /#pragma omp parallel/ for is significantly faster
> (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
> still such a big difference?

Sounds like you're using a profile to attribute time? Each `omp parallel` region incurs a cost ranging from about a microsecond to 10 or more microseconds depending on architecture, number of threads, and OpenMP implementation. Your loop (for double precision) operates at around 8 entries per clock cycle (depending on architecture) if the operands are in cache so the loop size r_0 * r_1 should be at least 10000 just to pay off the cost of `omp parallel`.

From roland.richter at ntnu.no  Wed Feb 17 11:11:04 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Wed, 17 Feb 2021 17:11:04 +0000
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <87y2fmbpw2.fsf@jedbrown.org>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>,
	<87y2fmbpw2.fsf@jedbrown.org>
Message-ID: <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>

My PetscScalar is complex double (i.e. even higher penalty), but my matrix has a size of 8kk elements, so that should not an issue.
Regards,
Roland
________________________________
Von: Jed Brown <jed at jedbrown.org>
Gesendet: Mittwoch, 17. Februar 2021 17:49:49
An: Roland Richter; PETSc
Betreff: Re: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results

Roland Richter <roland.richter at ntnu.no> writes:

> Hei,
>
> I replaced the linking line with
>
> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx  -march=native -fopenmp-simd
> -DMKL_LP64 -m64
> CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
> bin/armadillo_with_PETSc
> -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
> /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
> -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
> -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
> /opt/boost/lib/libboost_filesystem.so.1.72.0
> /opt/boost/lib/libboost_mpi.so.1.72.0
> /opt/boost/lib/libboost_program_options.so.1.72.0
> /opt/boost/lib/libboost_serialization.so.1.72.0
> /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
> /opt/petsc_release/lib/libpetsc.so
> /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
> /
>
> and now the results are correct. Nevertheless, when comparing the loop
> in line 26-28 in file test_scaling.cpp
>
> /#pragma omp parallel for//
> //    for(int i = 0; i < r_0 * r_1; ++i)//
> //        *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/
>
> the version without /#pragma omp parallel/ for is significantly faster
> (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
> still such a big difference?

Sounds like you're using a profile to attribute time? Each `omp parallel` region incurs a cost ranging from about a microsecond to 10 or more microseconds depending on architecture, number of threads, and OpenMP implementation. Your loop (for double precision) operates at around 8 entries per clock cycle (depending on architecture) if the operands are in cache so the loop size r_0 * r_1 should be at least 10000 just to pay off the cost of `omp parallel`.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/0653f4c3/attachment.html>

From knepley at gmail.com  Wed Feb 17 11:51:43 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 17 Feb 2021 12:51:43 -0500
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
	<87y2fmbpw2.fsf@jedbrown.org>
	<641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
Message-ID: <CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>

Jed, is it possible that this is an oversubscription penalty from bad
OpenMP settings? <said by a person who knows less about OpenMP than
cuneiform>

  Thanks,

     Matt

On Wed, Feb 17, 2021 at 12:11 PM Roland Richter <roland.richter at ntnu.no>
wrote:

> My PetscScalar is complex double (i.e. even higher penalty), but my matrix
> has a size of 8kk elements, so that should not an issue.
> Regards,
> Roland
> ------------------------------
> *Von:* Jed Brown <jed at jedbrown.org>
> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49
> *An:* Roland Richter; PETSc
> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in
> performance drop and wrong results
>
> Roland Richter <roland.richter at ntnu.no> writes:
>
> > Hei,
> >
> > I replaced the linking line with
> >
> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx  -march=native -fopenmp-simd
> > -DMKL_LP64 -m64
> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
> > bin/armadillo_with_PETSc
> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
> > /opt/boost/lib/libboost_filesystem.so.1.72.0
> > /opt/boost/lib/libboost_mpi.so.1.72.0
> > /opt/boost/lib/libboost_program_options.so.1.72.0
> > /opt/boost/lib/libboost_serialization.so.1.72.0
> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
> > /opt/petsc_release/lib/libpetsc.so
> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
> > /
> >
> > and now the results are correct. Nevertheless, when comparing the loop
> > in line 26-28 in file test_scaling.cpp
> >
> > /#pragma omp parallel for//
> > //    for(int i = 0; i < r_0 * r_1; ++i)//
> > //        *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/
> >
> > the version without /#pragma omp parallel/ for is significantly faster
> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
> > still such a big difference?
>
> Sounds like you're using a profile to attribute time? Each `omp parallel`
> region incurs a cost ranging from about a microsecond to 10 or more
> microseconds depending on architecture, number of threads, and OpenMP
> implementation. Your loop (for double precision) operates at around 8
> entries per clock cycle (depending on architecture) if the operands are in
> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off
> the cost of `omp parallel`.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/d37b2a2b/attachment.html>

From jed at jedbrown.org  Wed Feb 17 11:56:04 2021
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 17 Feb 2021 10:56:04 -0700
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
	<87y2fmbpw2.fsf@jedbrown.org>
	<641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
	<CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
Message-ID: <87v9aqbmtn.fsf@jedbrown.org>

It's entirely possible, especially if libgomp is being mixed with libiomp.

Roland hasn't show us the compilation line (just linker), because `omp parallel` shouldn't do anything with just -fopenmp-simd and no -fopenmp. 

Matthew Knepley <knepley at gmail.com> writes:

> Jed, is it possible that this is an oversubscription penalty from bad
> OpenMP settings? <said by a person who knows less about OpenMP than
> cuneiform>
>
>   Thanks,
>
>      Matt
>
> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter <roland.richter at ntnu.no>
> wrote:
>
>> My PetscScalar is complex double (i.e. even higher penalty), but my matrix
>> has a size of 8kk elements, so that should not an issue.
>> Regards,
>> Roland
>> ------------------------------
>> *Von:* Jed Brown <jed at jedbrown.org>
>> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49
>> *An:* Roland Richter; PETSc
>> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in
>> performance drop and wrong results
>>
>> Roland Richter <roland.richter at ntnu.no> writes:
>>
>> > Hei,
>> >
>> > I replaced the linking line with
>> >
>> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx  -march=native -fopenmp-simd
>> > -DMKL_LP64 -m64
>> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
>> > bin/armadillo_with_PETSc
>> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
>> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
>> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
>> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
>> > /opt/boost/lib/libboost_filesystem.so.1.72.0
>> > /opt/boost/lib/libboost_mpi.so.1.72.0
>> > /opt/boost/lib/libboost_program_options.so.1.72.0
>> > /opt/boost/lib/libboost_serialization.so.1.72.0
>> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
>> > /opt/petsc_release/lib/libpetsc.so
>> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
>> > /
>> >
>> > and now the results are correct. Nevertheless, when comparing the loop
>> > in line 26-28 in file test_scaling.cpp
>> >
>> > /#pragma omp parallel for//
>> > //    for(int i = 0; i < r_0 * r_1; ++i)//
>> > //        *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/
>> >
>> > the version without /#pragma omp parallel/ for is significantly faster
>> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
>> > still such a big difference?
>>
>> Sounds like you're using a profile to attribute time? Each `omp parallel`
>> region incurs a cost ranging from about a microsecond to 10 or more
>> microseconds depending on architecture, number of threads, and OpenMP
>> implementation. Your loop (for double precision) operates at around 8
>> entries per clock cycle (depending on architecture) if the operands are in
>> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off
>> the cost of `omp parallel`.
>>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

From roland.richter at ntnu.no  Wed Feb 17 13:28:50 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Wed, 17 Feb 2021 20:28:50 +0100
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <87v9aqbmtn.fsf@jedbrown.org>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
	<87y2fmbpw2.fsf@jedbrown.org>
	<641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
	<CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
	<87v9aqbmtn.fsf@jedbrown.org>
Message-ID: <0d840f08-c390-9cf7-d833-7a4f3efb37b5@ntnu.no>

Hei,

the compilation line is (as shown below)

//usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB
-DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK
-DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK
-DUSE_CUDA
-I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include
-I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include
-I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem
/opt/fftw3/include -isystem /opt/boost/include -march=native
-fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2
-funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native
-fopenmp -std=gnu++17 -c <source_files> -o <target_files>/

Regards,

Roland
//

Am 17.02.2021 um 18:56 schrieb Jed Brown:
> It's entirely possible, especially if libgomp is being mixed with libiomp.
>
> Roland hasn't show us the compilation line (just linker), because `omp parallel` shouldn't do anything with just -fopenmp-simd and no -fopenmp. 
>
> Matthew Knepley <knepley at gmail.com> writes:
>
>> Jed, is it possible that this is an oversubscription penalty from bad
>> OpenMP settings? <said by a person who knows less about OpenMP than
>> cuneiform>
>>
>>   Thanks,
>>
>>      Matt
>>
>> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter <roland.richter at ntnu.no>
>> wrote:
>>
>>> My PetscScalar is complex double (i.e. even higher penalty), but my matrix
>>> has a size of 8kk elements, so that should not an issue.
>>> Regards,
>>> Roland
>>> ------------------------------
>>> *Von:* Jed Brown <jed at jedbrown.org>
>>> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49
>>> *An:* Roland Richter; PETSc
>>> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in
>>> performance drop and wrong results
>>>
>>> Roland Richter <roland.richter at ntnu.no> writes:
>>>
>>>> Hei,
>>>>
>>>> I replaced the linking line with
>>>>
>>>> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx  -march=native -fopenmp-simd
>>>> -DMKL_LP64 -m64
>>>> CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
>>>> bin/armadillo_with_PETSc
>>>> -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
>>>> /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
>>>> -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
>>>> -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
>>>> /opt/boost/lib/libboost_filesystem.so.1.72.0
>>>> /opt/boost/lib/libboost_mpi.so.1.72.0
>>>> /opt/boost/lib/libboost_program_options.so.1.72.0
>>>> /opt/boost/lib/libboost_serialization.so.1.72.0
>>>> /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
>>>> /opt/petsc_release/lib/libpetsc.so
>>>> /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
>>>> /
>>>>
>>>> and now the results are correct. Nevertheless, when comparing the loop
>>>> in line 26-28 in file test_scaling.cpp
>>>>
>>>> /#pragma omp parallel for//
>>>> //    for(int i = 0; i < r_0 * r_1; ++i)//
>>>> //        *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/
>>>>
>>>> the version without /#pragma omp parallel/ for is significantly faster
>>>> (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
>>>> still such a big difference?
>>> Sounds like you're using a profile to attribute time? Each `omp parallel`
>>> region incurs a cost ranging from about a microsecond to 10 or more
>>> microseconds depending on architecture, number of threads, and OpenMP
>>> implementation. Your loop (for double precision) operates at around 8
>>> entries per clock cycle (depending on architecture) if the operands are in
>>> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off
>>> the cost of `omp parallel`.
>>>
>>
>> -- 
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/8eeee3f9/attachment.html>

From jed at jedbrown.org  Wed Feb 17 14:14:43 2021
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 17 Feb 2021 13:14:43 -0700
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <0d840f08-c390-9cf7-d833-7a4f3efb37b5@ntnu.no>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
	<87y2fmbpw2.fsf@jedbrown.org>
	<641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
	<CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
	<87v9aqbmtn.fsf@jedbrown.org>
	<0d840f08-c390-9cf7-d833-7a4f3efb37b5@ntnu.no>
Message-ID: <87h7mabgek.fsf@jedbrown.org>

Roland Richter <roland.richter at ntnu.no> writes:

> Hei,
>
> the compilation line is (as shown below)
>
> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB
> -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK
> -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK
> -DUSE_CUDA
> -I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include
> -I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include
> -I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem
> /opt/fftw3/include -isystem /opt/boost/include -march=native
> -fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2
> -funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native
> -fopenmp -std=gnu++17 -c <source_files> -o <target_files>/

-fopenmp implies -fopenmp-simd so you don't need both. You have -fopenmp here so it'll use threading, and likely default to libgomp (depending on what compiler is behind the mpicxx wrapper).

From zhaog6 at lsec.cc.ac.cn  Wed Feb 17 18:47:44 2021
From: zhaog6 at lsec.cc.ac.cn (=?UTF-8?B?6LW15Yia?=)
Date: Thu, 18 Feb 2021 08:47:44 +0800 (GMT+08:00)
Subject: [petsc-users] An issue about pipelined CG and Gropp's CG
Message-ID: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn>

Dear PETSc team,

I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)?

Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes.

Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you.


Thanks,
Gang

From bsmith at petsc.dev  Wed Feb 17 19:17:17 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 17 Feb 2021 19:17:17 -0600
Subject: [petsc-users] An issue about pipelined CG and Gropp's CG
In-Reply-To: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn>
References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn>
Message-ID: <BE4B64F6-9339-4CC2-AB7C-E8FCE2B59A7E@petsc.dev>


> On Feb 17, 2021, at 6:47 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:
> 
> Dear PETSc team,
> 
> I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)?
> 
> Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes.

  900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference.

   Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect.

  Barry


> 
> Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you.
> 
> 
> Thanks,
> Gang


From heepark at sandia.gov  Wed Feb 17 19:23:37 2021
From: heepark at sandia.gov (Park, Heeho)
Date: Thu, 18 Feb 2021 01:23:37 +0000
Subject: [petsc-users] insufficient virtual memory?
Message-ID: <C345DFB6-E7F3-4A4A-967F-396E4EB7D657@sandia.gov>

Hi PETSc developers,

Have you seen this error message?

      forrtl: severe (41): insufficient virtual memory

We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is failing with the error message on our HPC systems.
Ironically, it runs on our laptop (super slow.)

      type: seqbaij
      rows=46251272, cols=46251272
      total: nonzeros=323046210, allocated nonzeros=323046210
      total number of mallocs used during MatSetValues calls=0
          block size is 1

Does anyone have experience encountering this problem?

Thanks,

Heeho Daniel Park

! ------------------------------------ !
Sandia National Laboratories
Org: 08844, R&D
Work: 505-844-1319
! ------------------------------------ !

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/cbbec417/attachment-0001.html>

From knepley at gmail.com  Wed Feb 17 19:27:29 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 17 Feb 2021 20:27:29 -0500
Subject: [petsc-users] insufficient virtual memory?
In-Reply-To: <C345DFB6-E7F3-4A4A-967F-396E4EB7D657@sandia.gov>
References: <C345DFB6-E7F3-4A4A-967F-396E4EB7D657@sandia.gov>
Message-ID: <CAMYG4GnmyDKC-Whv2UnWgZRhpwSjq9G_Mv=1ScOyEi+euUhSAg@mail.gmail.com>

On Wed, Feb 17, 2021 at 8:23 PM Park, Heeho via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi PETSc developers,
>
>
>
> Have you seen this error message?
>
>
>
>       forrtl: severe (41): insufficient virtual memory
>

I believe this is an OS setting, independent of the code itself, which
explains the difference you see between
machines. This also suggests asking the local sysadmin for your cluster.

  Thanks,

     Matt


>
>
> We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is
> failing with the error message on our HPC systems.
>
> Ironically, it runs on our laptop (super slow.)
>
>
>
>       type: seqbaij
>
>       rows=46251272, cols=46251272
>
>       total: nonzeros=323046210, allocated nonzeros=323046210
>
>       total number of mallocs used during MatSetValues calls=0
>
>           block size is 1
>
>
>
> Does anyone have experience encountering this problem?
>
>
>
> Thanks,
>
>
>
> Heeho Daniel Park
>
> ! ------------------------------------ !
> Sandia National Laboratories
>
> Org: 08844, R&D
>
> Work: 505-844-1319
> ! ------------------------------------ !
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/3b2c1233/attachment.html>

From heepark at sandia.gov  Wed Feb 17 19:32:19 2021
From: heepark at sandia.gov (Park, Heeho)
Date: Thu, 18 Feb 2021 01:32:19 +0000
Subject: [petsc-users] [EXTERNAL] Re:  insufficient virtual memory?
In-Reply-To: <CAMYG4GnmyDKC-Whv2UnWgZRhpwSjq9G_Mv=1ScOyEi+euUhSAg@mail.gmail.com>
References: <C345DFB6-E7F3-4A4A-967F-396E4EB7D657@sandia.gov>
	<CAMYG4GnmyDKC-Whv2UnWgZRhpwSjq9G_Mv=1ScOyEi+euUhSAg@mail.gmail.com>
Message-ID: <1BF5C245-9FD9-4E77-8817-7AC61A094755@sandia.gov>

That makes sense. That was my suspicion too. I will contact the sys admin. Thanks.

- Heeho Daniel Park

From: Matthew Knepley <knepley at gmail.com>
Date: Wednesday, February 17, 2021 at 5:28 PM
To: "Park, Heeho" <heepark at sandia.gov>
Cc: "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
Subject: [EXTERNAL] Re: [petsc-users] insufficient virtual memory?

On Wed, Feb 17, 2021 at 8:23 PM Park, Heeho via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hi PETSc developers,

Have you seen this error message?

      forrtl: severe (41): insufficient virtual memory

I believe this is an OS setting, independent of the code itself, which explains the difference you see between
machines. This also suggests asking the local sysadmin for your cluster.

  Thanks,

     Matt


We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is failing with the error message on our HPC systems.
Ironically, it runs on our laptop (super slow.)

      type: seqbaij
      rows=46251272, cols=46251272
      total: nonzeros=323046210, allocated nonzeros=323046210
      total number of mallocs used during MatSetValues calls=0
          block size is 1

Does anyone have experience encountering this problem?

Thanks,

Heeho Daniel Park

! ------------------------------------ !
Sandia National Laboratories
Org: 08844, R&D
Work: 505-844-1319
! ------------------------------------ !


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/7078fdb6/attachment.html>

From zhaog6 at lsec.cc.ac.cn  Wed Feb 17 20:31:14 2021
From: zhaog6 at lsec.cc.ac.cn (=?UTF-8?B?6LW15Yia?=)
Date: Thu, 18 Feb 2021 10:31:14 +0800 (GMT+08:00)
Subject: [petsc-users] An issue about pipelined CG and Gropp's CG
In-Reply-To: <BE4B64F6-9339-4CC2-AB7C-E8FCE2B59A7E@petsc.dev>
References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn>
	<BE4B64F6-9339-4CC2-AB7C-E8FCE2B59A7E@petsc.dev>
Message-ID: <68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn>

Dear Barry,

Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it.


Thanks,
Gang


&gt; -----????-----
&gt; ???: "Barry Smith" <bsmith at petsc.dev>
&gt; ????: 2021-02-18 09:17:17 (???)
&gt; ???: "??" <zhaog6 at lsec.cc.ac.cn>
&gt; ??: PETSc <petsc-users at mcs.anl.gov>
&gt; ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
&gt; 
&gt; 
&gt; 
&gt; &gt; On Feb 17, 2021, at 6:47 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:
&gt; &gt; 
&gt; &gt; Dear PETSc team,
&gt; &gt; 
&gt; &gt; I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)?
&gt; &gt; 
&gt; &gt; Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes.
&gt; 
&gt;   900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference.
&gt; 
&gt;    Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect.
&gt; 
&gt;   Barry
&gt; 
&gt; 
&gt; &gt; 
&gt; &gt; Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you.
&gt; &gt; 
&gt; &gt; 
&gt; &gt; Thanks,
&gt; &gt; Gang
</zhaog6 at lsec.cc.ac.cn></petsc-users at mcs.anl.gov></zhaog6 at lsec.cc.ac.cn></bsmith at petsc.dev>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cg.out
Type: application/octet-stream
Size: 14940 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/ff3dc643/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: groppcg.out
Type: application/octet-stream
Size: 15189 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/ff3dc643/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pipecg.out
Type: application/octet-stream
Size: 15063 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/ff3dc643/attachment-0005.obj>

From bsmith at petsc.dev  Wed Feb 17 20:52:11 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 17 Feb 2021 20:52:11 -0600
Subject: [petsc-users] An issue about pipelined CG and Gropp's CG
In-Reply-To: <68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn>
References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn>
	<BE4B64F6-9339-4CC2-AB7C-E8FCE2B59A7E@petsc.dev>
	<68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn>
Message-ID: <9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev>


  First please see https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined <https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined> and verify that the MPI you are using satisfies the requirements and you have appropriate MPI environmental variables set (if needed). 


  Then please add a stage around the actual computation to get a more useful summary. 

  Organize your code like so

  ...
  KSPSetUp()
  PetscLogStagePush(a stage you created)
  KSPSolve()
  PetscLogStagePop()
  ...

  It is unclear where much of the time of your code is being spent, by adding the stage we'll have a clear picture of the time in the actual solver. There are examples of using PetscLogStagePush() in the source.

  With the new -log_view files you generate with these two changes we can get a handle on where the time is being spent and why the pipelining is or is not helping.

  Barry

> On Feb 17, 2021, at 8:31 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:
> 
> Dear Barry,
> 
> Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it.
> 
> 
> 
> Thanks,
> Gang
> 
> 
> &gt; -----????-----
> &gt; ???: "Barry Smith" <bsmith at petsc.dev>
> &gt; ????: 2021-02-18 09:17:17 (???)
> &gt; ???: "??" <zhaog6 at lsec.cc.ac.cn>
> &gt; ??: PETSc <petsc-users at mcs.anl.gov>
> &gt; ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
> &gt; 
> &gt; 
> &gt; 
> &gt; &gt; On Feb 17, 2021, at 6:47 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:
> &gt; &gt; 
> &gt; &gt; Dear PETSc team,
> &gt; &gt; 
> &gt; &gt; I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)?
> &gt; &gt; 
> &gt; &gt; Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes.
> &gt; 
> &gt;   900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference.
> &gt; 
> &gt;    Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect.
> &gt; 
> &gt;   Barry
> &gt; 
> &gt; 
> &gt; &gt; 
> &gt; &gt; Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you.
> &gt; &gt; 
> &gt; &gt; 
> &gt; &gt; Thanks,
> &gt; &gt; Gang
> </zhaog6 at lsec.cc.ac.cn></petsc-users at mcs.anl.gov></zhaog6 at lsec.cc.ac.cn></bsmith at petsc.dev><cg.out><groppcg.out><pipecg.out>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/5f8bd3c6/attachment.html>

From bsmith at petsc.dev  Wed Feb 17 20:57:28 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 17 Feb 2021 20:57:28 -0600
Subject: [petsc-users] insufficient virtual memory?
In-Reply-To: <C345DFB6-E7F3-4A4A-967F-396E4EB7D657@sandia.gov>
References: <C345DFB6-E7F3-4A4A-967F-396E4EB7D657@sandia.gov>
Message-ID: <263912AE-3062-43E3-BBF3-7B3E4703AB0C@petsc.dev>


   PETSc gets almost all its memory using the C malloc system calls so it is unlikely that this Fortran error message comes from PETSc code. My guess is that you have some Fortran arrays declared somewhere in your code that are large and require memory that is not available. 

  Barry


> On Feb 17, 2021, at 7:23 PM, Park, Heeho via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hi PETSc developers,
>  
> Have you seen this error message?
>  
>       forrtl: severe (41): insufficient virtual memory
>  
> We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is failing with the error message on our HPC systems.
> Ironically, it runs on our laptop (super slow.)
>  
>       type: seqbaij
>       rows=46251272, cols=46251272
>       total: nonzeros=323046210, allocated nonzeros=323046210
>       total number of mallocs used during MatSetValues calls=0
>           block size is 1
>  
> Does anyone have experience encountering this problem?
>  
> Thanks,
>  
> Heeho Daniel Park
> 
> ! ------------------------------------ !
> Sandia National Laboratories
> Org: 08844, R&D
> Work: 505-844-1319
> ! ------------------------------------ !

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/7ac17f4c/attachment.html>

From zhaog6 at lsec.cc.ac.cn  Wed Feb 17 22:31:42 2021
From: zhaog6 at lsec.cc.ac.cn (=?UTF-8?B?6LW15Yia?=)
Date: Thu, 18 Feb 2021 12:31:42 +0800 (GMT+08:00)
Subject: [petsc-users] An issue about pipelined CG and Gropp's CG
In-Reply-To: <9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev>
References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn>
	<BE4B64F6-9339-4CC2-AB7C-E8FCE2B59A7E@petsc.dev>
	<68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn>
	<9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev>
Message-ID: <1b8b517f.a06.177b3687191.Coremail.zhaog6@lsec.cc.ac.cn>

Dear Barry,


Thank you. For MPI, MVAPICH-2.3.5 is used on my cluster by default, I add PetscLogStagePush("Calling KSPSolve()...") and PetscLogStagePop(). I am using other numerical software and have called PETSc only when solving linear system through PETSc interface supported by the software, but I'm not sure if I have added it correctly. I put the result and info into attachment, please check it.


Thanks,

Gang


-----????-----
???:"Barry Smith" <bsmith at petsc.dev>
????:2021-02-18 10:52:11 (???)
???: "??" <zhaog6 at lsec.cc.ac.cn>
??: PETSc <petsc-users at mcs.anl.gov>
??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG


  First please see https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined and verify that the MPI you are using satisfies the requirements and you have appropriate MPI environmental variables set (if needed). 


  Then please add a stage around the actual computation to get a more useful summary. 


  Organize your code like so


  ...
  KSPSetUp()
  PetscLogStagePush(a stage you created)
  KSPSolve()
  PetscLogStagePop()
  ...


  It is unclear where much of the time of your code is being spent, by adding the stage we'll have a clear picture of the time in the actual solver. There are examples of using PetscLogStagePush() in the source.


  With the new -log_view files you generate with these two changes we can get a handle on where the time is being spent and why the pipelining is or is not helping.


  Barry


On Feb 17, 2021, at 8:31 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:


Dear Barry,

Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it.


Thanks,
Gang


&gt; -----????-----
&gt; ???: "Barry Smith" <bsmith at petsc.dev>
&gt; ????: 2021-02-18 09:17:17 (???)
&gt; ???: "??" <zhaog6 at lsec.cc.ac.cn>
&gt; ??: PETSc <petsc-users at mcs.anl.gov>
&gt; ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
&gt;
&gt;
&gt;
&gt; &gt; On Feb 17, 2021, at 6:47 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:
&gt; &gt;
&gt; &gt; Dear PETSc team,
&gt; &gt;
&gt; &gt; I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)?
&gt; &gt;
&gt; &gt; Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes.
&gt;
&gt;   900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference.
&gt;
&gt;    Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect.
&gt;
&gt;   Barry
&gt;
&gt;
&gt; &gt;
&gt; &gt; Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you.
&gt; &gt;
&gt; &gt;
&gt; &gt; Thanks,
&gt; &gt; Gang
</zhaog6 at lsec.cc.ac.cn></petsc-users at mcs.anl.gov></zhaog6 at lsec.cc.ac.cn></bsmith at petsc.dev><cg.out><groppcg.out><pipecg.out>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/95bdaab1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cg.out
Type: application/octet-stream
Size: 14587 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/95bdaab1/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: groppcg.out
Type: application/octet-stream
Size: 14835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/95bdaab1/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pipecg.out
Type: application/octet-stream
Size: 14712 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/95bdaab1/attachment-0005.obj>

From bsmith at petsc.dev  Wed Feb 17 23:09:43 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 17 Feb 2021 23:09:43 -0600
Subject: [petsc-users] An issue about pipelined CG and Gropp's CG
In-Reply-To: <1b8b517f.a06.177b3687191.Coremail.zhaog6@lsec.cc.ac.cn>
References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn>
	<BE4B64F6-9339-4CC2-AB7C-E8FCE2B59A7E@petsc.dev>
	<68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn>
	<9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev>
	<1b8b517f.a06.177b3687191.Coremail.zhaog6@lsec.cc.ac.cn>
Message-ID: <DDD3949F-F23F-4125-BDAD-6BB17AF0E9A3@petsc.dev>


  Here are the important operations from the -log_view (use a fixed sized font for easy reading).

No pipeline

------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

MatMult             5398 1.0 9.4707e+0012.6 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00  3 52100100  0  10 52100100  0 124335
VecTDot            10796 1.0 1.4993e+01 8.3 3.23e+08 1.1 0.0e+00 0.0e+00 1.1e+04 16 16  0  0 67  55 16  0  0 67 24172
VecNorm             5399 1.0 6.2343e+00 4.4 1.61e+08 1.1 0.0e+00 0.0e+00 5.4e+03 10  8  0  0 33  33  8  0  0 33 29073
VecAXPY            10796 1.0 1.1721e-01 1.4 3.23e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 16  0  0  0   1 16  0  0  0 3092074
VecAYPX             5397 1.0 5.4340e-02 1.4 1.61e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  8  0  0  0   0  8  0  0  0 3334231
VecScatterBegin     5398 1.0 5.4152e-02 3.3 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5398 1.0 8.6881e+00489.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   6  0  0  0  0     0
KSPSolve               1 1.0 1.7389e+01 1.0 2.02e+09 1.1 3.6e+07 6.9e+02 1.6e+04 29100100100100 100100100100100 130242

Gropp pipeline

MatMult             5399 1.0 9.5593e+0011.7 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00  3 45100100  0   7 45100100  0 123207
VecNorm                1 1.0 8.8549e-0417.4 2.99e+04 1.1 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  4   0  0  0  0 20 37912
VecAXPY            16194 1.0 1.6522e-01 1.4 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 21  0  0  0   0 21  0  0  0 3290407
VecAYPX            10794 1.0 1.9903e-01 1.5 3.23e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 14  0  0  0   1 14  0  0  0 1820606
VecScatterBegin     5399 1.0 6.2281e-02 3.6 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5399 1.0 8.7194e+00380.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   4  0  0  0  0     0
VecReduceArith     16195 1.0 2.2674e-01 3.7 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 21  0  0  0   0 21  0  0  0 2397678
VecReduceBegin     10797 1.0 3.4089e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceEnd       10797 1.0 2.6197e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 37  0  0  0  0  91  0  0  0  0     0
SFBcastOpBegin      5399 1.0 6.0051e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
SFBcastOpEnd        5399 1.0 8.7167e+00440.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   4  0  0  0  0     0
KSPSolve               1 1.0 2.7477e+01 1.0 2.34e+09 1.1 3.6e+07 6.9e+02 1.0e+00 41100100100  4 100100100100 20 95623

pipeline cg

MatMult             5400 1.0 1.5915e+00 1.8 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00  2 37100100  0   6 37100100  0 740161
VecAXPY            21592 1.0 2.3194e-01 1.4 6.45e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 23  0  0  0   1 23  0  0  0 3125164
VecAYPX            21588 1.0 5.5059e-01 1.7 6.45e+08 1.1 0.0e+00 0.0e+00 0.0e+00  1 23  0  0  0   2 23  0  0  0 1316272
VecScatterBegin     5400 1.0 7.0132e-02 3.7 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5400 1.0 6.5329e-0122.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecReduceArith     16197 1.0 3.1135e-01 4.7 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 17  0  0  0   1 17  0  0  0 1746339
VecReduceBegin      5400 1.0 3.1471e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceEnd        5400 1.0 1.7226e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 28  0  0  0  0  90  0  0  0  0     0
SFBcastOpBegin      5400 1.0 6.6228e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
SFBcastOpEnd        5400 1.0 6.5000e-0124.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
KSPSolve               1 1.0 1.8893e+01 1.0 2.82e+09 1.1 3.6e+07 6.9e+02 0.0e+00 32100100100  0 100100100100  0 167860

With pipelined methods the TDot and Vec norm are replaced with VecReduceArith, VecReduceBegin, and VecReduceEnd. The important numbers are 
the %T in the stage. 

In particular look at VecTDot and VecNorm and compare to VecReduceEnd in the pipeline methods. Note that both pipelined methods, especially the gropp method spend an enormous time in VecReduceEnd and hence end up taking more time than the non-pipelined method. So basically any advantage the pipeline methods may have is lost waiting for the previous reduction operation to arrive. I do not know why, if it is the MPI implementation or something else. 

If you are serious about understanding pipeline methods for Krylov methods you will need to dig deep into the details of the machine hardware and MPI software. It is not a trivial subject with easy answers. I would say that the PETSc community are not experts on the topic, you will need to read in detail the publications on pipelined methods and consult with the authors on technical, machine specific details. There is a difference between the academic "pipelining as a theoretical construct"  and actually dramatic improvement on real machines while using pipelining. One small implementation detail can dramatically change performance so theoretical papers alone are not the complete story.


  Barry


------------------------------------------------------------------------------------------------------------------------

> On Feb 17, 2021, at 10:31 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:
> 
> Dear Barry,
> 
> 
> 
> Thank you. For MPI, MVAPICH-2.3.5 is used on my cluster by default, I add PetscLogStagePush("Calling KSPSolve()...") and PetscLogStagePop(). I am using other numerical software and have called PETSc only when solving linear system through PETSc interface supported by the software, but I'm not sure if I have added it correctly. I put the result and info into attachment, please check it.
> 
> 
> 
> 
> 
> Thanks,
> 
> Gang
> 
> 
> 
> -----????-----
> ???:"Barry Smith" <bsmith at petsc.dev>
> ????:2021-02-18 10:52:11 (???)
> ???: "??" <zhaog6 at lsec.cc.ac.cn>
> ??: PETSc <petsc-users at mcs.anl.gov>
> ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
> 
> 
>   First please see https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined <https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined> and verify that the MPI you are using satisfies the requirements and you have appropriate MPI environmental variables set (if needed). 
> 
> 
>   Then please add a stage around the actual computation to get a more useful summary. 
> 
>   Organize your code like so
> 
>   ...
>   KSPSetUp()
>   PetscLogStagePush(a stage you created)
>   KSPSolve()
>   PetscLogStagePop()
>   ...
> 
>   It is unclear where much of the time of your code is being spent, by adding the stage we'll have a clear picture of the time in the actual solver. There are examples of using PetscLogStagePush() in the source.
> 
>   With the new -log_view files you generate with these two changes we can get a handle on where the time is being spent and why the pipelining is or is not helping.
> 
>   Barry
> 
>> On Feb 17, 2021, at 8:31 PM, ?? <zhaog6 at lsec.cc.ac.cn <mailto:zhaog6 at lsec.cc.ac.cn>> wrote:
>> 
>> Dear Barry,
>> 
>> Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it.
>> 
>> 
>> 
>> Thanks,
>> Gang
>> 
>> 
>> &gt; -----????-----
>> &gt; ???: "Barry Smith" <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>> &gt; ????: 2021-02-18 09:17:17 (???)
>> &gt; ???: "??" <zhaog6 at lsec.cc.ac.cn <mailto:zhaog6 at lsec.cc.ac.cn>>
>> &gt; ??: PETSc <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> &gt; ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
>> &gt; 
>> &gt; 
>> &gt; 
>> &gt; &gt; On Feb 17, 2021, at 6:47 PM, ?? <zhaog6 at lsec.cc.ac.cn <mailto:zhaog6 at lsec.cc.ac.cn>> wrote:
>> &gt; &gt; 
>> &gt; &gt; Dear PETSc team,
>> &gt; &gt; 
>> &gt; &gt; I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)?
>> &gt; &gt; 
>> &gt; &gt; Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes.
>> &gt; 
>> &gt;   900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference.
>> &gt; 
>> &gt;    Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect.
>> &gt; 
>> &gt;   Barry
>> &gt; 
>> &gt; 
>> &gt; &gt; 
>> &gt; &gt; Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you.
>> &gt; &gt; 
>> &gt; &gt; 
>> &gt; &gt; Thanks,
>> &gt; &gt; Gang
>> </zhaog6 at lsec.cc.ac.cn <mailto:zhaog6 at lsec.cc.ac.cn>></petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>></zhaog6 at lsec.cc.ac.cn <mailto:zhaog6 at lsec.cc.ac.cn>></bsmith at petsc.dev <mailto:bsmith at petsc.dev>><cg.out><groppcg.out><pipecg.out>
> 
> <cg.out><groppcg.out><pipecg.out>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210217/ff4c7063/attachment.html>

From zhaog6 at lsec.cc.ac.cn  Thu Feb 18 00:38:22 2021
From: zhaog6 at lsec.cc.ac.cn (=?UTF-8?B?6LW15Yia?=)
Date: Thu, 18 Feb 2021 14:38:22 +0800 (GMT+08:00)
Subject: [petsc-users] An issue about pipelined CG and Gropp's CG
In-Reply-To: <DDD3949F-F23F-4125-BDAD-6BB17AF0E9A3@petsc.dev>
References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn>
	<BE4B64F6-9339-4CC2-AB7C-E8FCE2B59A7E@petsc.dev>
	<68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn>
	<9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev>
	<1b8b517f.a06.177b3687191.Coremail.zhaog6@lsec.cc.ac.cn>
	<DDD3949F-F23F-4125-BDAD-6BB17AF0E9A3@petsc.dev>
Message-ID: <662101ed.d68.177b3dc6a83.Coremail.zhaog6@lsec.cc.ac.cn>

Thank you a lot for your analysis and suggestions,  I quite agree with your opinion for the difference of theoretical and actual. I'll try to change into MPICH-3.4 rather than MVAPICH-2.3.5 I've used before.


Thanks,

Gang


-----????-----
???:"Barry Smith" <bsmith at petsc.dev>
????:2021-02-18 13:09:43 (???)
???: "??" <zhaog6 at lsec.cc.ac.cn>
??: PETSc <petsc-users at mcs.anl.gov>
??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG


  Here are the important operations from the -log_view (use a fixed sized font for easy reading).


No pipeline


------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------


MatMult             5398 1.0 9.4707e+0012.6 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00  3 52100100  0  10 52100100  0 124335
VecTDot            10796 1.0 1.4993e+01 8.3 3.23e+08 1.1 0.0e+00 0.0e+00 1.1e+04 16 16  0  0 67  55 16  0  0 67 24172
VecNorm             5399 1.0 6.2343e+00 4.4 1.61e+08 1.1 0.0e+00 0.0e+00 5.4e+03 10  8  0  0 33  33  8  0  0 33 29073
VecAXPY            10796 1.0 1.1721e-01 1.4 3.23e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 16  0  0  0   1 16  0  0  0 3092074
VecAYPX             5397 1.0 5.4340e-02 1.4 1.61e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  8  0  0  0   0  8  0  0  0 3334231
VecScatterBegin     5398 1.0 5.4152e-02 3.3 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5398 1.0 8.6881e+00489.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   6  0  0  0  0     0
KSPSolve               1 1.0 1.7389e+01 1.0 2.02e+09 1.1 3.6e+07 6.9e+02 1.6e+04 29100100100100 100100100100100 130242


Gropp pipeline


MatMult             5399 1.0 9.5593e+0011.7 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00  3 45100100  0   7 45100100  0 123207
VecNorm                1 1.0 8.8549e-0417.4 2.99e+04 1.1 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  4   0  0  0  0 20 37912
VecAXPY            16194 1.0 1.6522e-01 1.4 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 21  0  0  0   0 21  0  0  0 3290407
VecAYPX            10794 1.0 1.9903e-01 1.5 3.23e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 14  0  0  0   1 14  0  0  0 1820606
VecScatterBegin     5399 1.0 6.2281e-02 3.6 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5399 1.0 8.7194e+00380.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   4  0  0  0  0     0
VecReduceArith     16195 1.0 2.2674e-01 3.7 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 21  0  0  0   0 21  0  0  0 2397678
VecReduceBegin     10797 1.0 3.4089e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceEnd       10797 1.0 2.6197e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 37  0  0  0  0  91  0  0  0  0     0
SFBcastOpBegin      5399 1.0 6.0051e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
SFBcastOpEnd        5399 1.0 8.7167e+00440.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   4  0  0  0  0     0
KSPSolve               1 1.0 2.7477e+01 1.0 2.34e+09 1.1 3.6e+07 6.9e+02 1.0e+00 41100100100  4 100100100100 20 95623


pipeline cg


MatMult             5400 1.0 1.5915e+00 1.8 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00  2 37100100  0   6 37100100  0 740161
VecAXPY            21592 1.0 2.3194e-01 1.4 6.45e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 23  0  0  0   1 23  0  0  0 3125164
VecAYPX            21588 1.0 5.5059e-01 1.7 6.45e+08 1.1 0.0e+00 0.0e+00 0.0e+00  1 23  0  0  0   2 23  0  0  0 1316272
VecScatterBegin     5400 1.0 7.0132e-02 3.7 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       5400 1.0 6.5329e-0122.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecReduceArith     16197 1.0 3.1135e-01 4.7 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 17  0  0  0   1 17  0  0  0 1746339
VecReduceBegin      5400 1.0 3.1471e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceEnd        5400 1.0 1.7226e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 28  0  0  0  0  90  0  0  0  0     0
SFBcastOpBegin      5400 1.0 6.6228e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00  0  0100100  0   0  0100100  0     0
SFBcastOpEnd        5400 1.0 6.5000e-0124.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
KSPSolve               1 1.0 1.8893e+01 1.0 2.82e+09 1.1 3.6e+07 6.9e+02 0.0e+00 32100100100  0 100100100100  0 167860


With pipelined methods the TDot and Vec norm are replaced with VecReduceArith, VecReduceBegin, and VecReduceEnd. The important numbers are 
the %T in the stage. 


In particular look at VecTDot and VecNorm and compare to VecReduceEnd in the pipeline methods. Note that both pipelined methods, especially the gropp method spend an enormous time in VecReduceEnd and hence end up taking more time than the non-pipelined method. So basically any advantage the pipeline methods may have is lost waiting for the previous reduction operation to arrive. I do not know why, if it is the MPI implementation or something else. 


If you are serious about understanding pipeline methods for Krylov methods you will need to dig deep into the details of the machine hardware and MPI software. It is not a trivial subject with easy answers. I would say that the PETSc community are not experts on the topic, you will need to read in detail the publications on pipelined methods and consult with the authors on technical, machine specific details. There is a difference between the academic "pipelining as a theoretical construct"  and actually dramatic improvement on real machines while using pipelining. One small implementation detail can dramatically change performance so theoretical papers alone are not the complete story.


  Barry


------------------------------------------------------------------------------------------------------------------------


On Feb 17, 2021, at 10:31 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:


Dear Barry,


Thank you. For MPI, MVAPICH-2.3.5 is used on my cluster by default, I add PetscLogStagePush("Calling KSPSolve()...") and PetscLogStagePop(). I am using other numerical software and have called PETSc only when solving linear system through PETSc interface supported by the software, but I'm not sure if I have added it correctly. I put the result and info into attachment, please check it.


Thanks,

Gang


-----????-----
???:"Barry Smith" <bsmith at petsc.dev>
????:2021-02-18 10:52:11 (???)
???: "??" <zhaog6 at lsec.cc.ac.cn>
??: PETSc <petsc-users at mcs.anl.gov>
??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG


  First please see https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined and verify that the MPI you are using satisfies the requirements and you have appropriate MPI environmental variables set (if needed). 


  Then please add a stage around the actual computation to get a more useful summary. 


  Organize your code like so


  ...
  KSPSetUp()
  PetscLogStagePush(a stage you created)
  KSPSolve()
  PetscLogStagePop()
  ...


  It is unclear where much of the time of your code is being spent, by adding the stage we'll have a clear picture of the time in the actual solver. There are examples of using PetscLogStagePush() in the source.


  With the new -log_view files you generate with these two changes we can get a handle on where the time is being spent and why the pipelining is or is not helping.


  Barry


On Feb 17, 2021, at 8:31 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:


Dear Barry,

Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it.


Thanks,
Gang


&gt; -----????-----
&gt; ???: "Barry Smith" <bsmith at petsc.dev>
&gt; ????: 2021-02-18 09:17:17 (???)
&gt; ???: "??" <zhaog6 at lsec.cc.ac.cn>
&gt; ??: PETSc <petsc-users at mcs.anl.gov>
&gt; ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG
&gt;
&gt;
&gt;
&gt; &gt; On Feb 17, 2021, at 6:47 PM, ?? <zhaog6 at lsec.cc.ac.cn> wrote:
&gt; &gt;
&gt; &gt; Dear PETSc team,
&gt; &gt;
&gt; &gt; I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)?
&gt; &gt;
&gt; &gt; Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes.
&gt;
&gt;   900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference.
&gt;
&gt;    Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect.
&gt;
&gt;   Barry
&gt;
&gt;
&gt; &gt;
&gt; &gt; Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you.
&gt; &gt;
&gt; &gt;
&gt; &gt; Thanks,
&gt; &gt; Gang
</zhaog6 at lsec.cc.ac.cn></petsc-users at mcs.anl.gov></zhaog6 at lsec.cc.ac.cn></bsmith at petsc.dev><cg.out><groppcg.out><pipecg.out>


<cg.out><groppcg.out><pipecg.out>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/d8e8557d/attachment-0001.html>

From roland.richter at ntnu.no  Thu Feb 18 02:09:35 2021
From: roland.richter at ntnu.no (Roland Richter)
Date: Thu, 18 Feb 2021 09:09:35 +0100
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
	<87y2fmbpw2.fsf@jedbrown.org>
	<641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
	<CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
Message-ID: <50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no>

Hei,

that was the reason for increased run times. When removing #pragma omp
parallel for, my loop took ~18 seconds. When changing it to #pragma omp
parallel for num_threads(2) or #pragma omp parallel for num_threads(4)
(on a i7-6700), the loop took ~16 s, but when increasing it to #pragma
omp parallel for num_threads(8), the loop took 28 s.

Regards,

Roland

Am 17.02.21 um 18:51 schrieb Matthew Knepley:
> Jed, is it possible that this is an oversubscription penalty from bad
> OpenMP settings? <said by a person who knows less about OpenMP than
> cuneiform>
>
> ? Thanks,
>
> ? ? ?Matt
>
> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter
> <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>
>     My PetscScalar is complex double (i.e. even higher penalty), but
>     my matrix has a size of 8kk elements, so that should not an issue.
>     Regards,
>     Roland
>     ------------------------------------------------------------------------
>     *Von:* Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>>
>     *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49
>     *An:* Roland Richter; PETSc
>     *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in
>     performance drop and wrong results
>     ?
>     Roland Richter <roland.richter at ntnu.no
>     <mailto:roland.richter at ntnu.no>> writes:
>
>     > Hei,
>     >
>     > I replaced the linking line with
>     >
>     > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd
>     > -DMKL_LP64 -m64
>     > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
>     > bin/armadillo_with_PETSc?
>     > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
>     > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran?
>     > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
>     > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
>     > /opt/boost/lib/libboost_filesystem.so.1.72.0
>     > /opt/boost/lib/libboost_mpi.so.1.72.0
>     > /opt/boost/lib/libboost_program_options.so.1.72.0
>     > /opt/boost/lib/libboost_serialization.so.1.72.0
>     > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
>     > /opt/petsc_release/lib/libpetsc.so
>     > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
>     > /
>     >
>     > and now the results are correct. Nevertheless, when comparing
>     the loop
>     > in line 26-28 in file test_scaling.cpp
>     >
>     > /#pragma omp parallel for//
>     > //??? for(int i = 0; i < r_0 * r_1; ++i)//
>     > //??? ??? *(out_mat_ptr + i) = (*(in_mat_ptr + i) *
>     scaling_factor);/
>     >
>     > the version without /#pragma omp parallel/ for is significantly
>     faster
>     > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
>     > still such a big difference?
>
>     Sounds like you're using a profile to attribute time? Each `omp
>     parallel` region incurs a cost ranging from about a microsecond to
>     10 or more microseconds depending on architecture, number of
>     threads, and OpenMP implementation. Your loop (for double
>     precision) operates at around 8 entries per clock cycle (depending
>     on architecture) if the operands are in cache so the loop size r_0
>     * r_1 should be at least 10000 just to pay off the cost of `omp
>     parallel`.
>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/e4b8bcc8/attachment.html>

From e0425375 at gmail.com  Thu Feb 18 06:00:58 2021
From: e0425375 at gmail.com (Florian Bruckner)
Date: Thu, 18 Feb 2021 13:00:58 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <BCA76786-A7C6-4282-A230-98ED85D653DC@dsic.upv.es>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
	<119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
	<CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>
	<6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev>
	<46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es>
	<CAMYG4G=nN79akOKMXYS-YXesTj1pZq-br6C_4GA29Vm0zFAwGQ@mail.gmail.com>
	<80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es>
	<BCA76786-A7C6-4282-A230-98ED85D653DC@dsic.upv.es>
Message-ID: <CAEc0zDqNiT_vBSSB-MEjPiiTK+yG+-9VpHZTpsOaG-+fBHs9RQ@mail.gmail.com>

Dear Jose,
thanks for your work. I just looked over the code, but I didn't have time
to implement our solver, yet.
If I understand the code correctly, it allows to set a precond-matrix which
should approximate A-sigma*B.

I will try to get our code running in the next few weeks. From user
perspective it would maybe simplify things if approximations for A as well
as B are given, since this would hide the internal ST transformations.

best wishes
Florian

On Tue, Feb 16, 2021 at 8:54 PM Jose E. Roman <jroman at dsic.upv.es> wrote:

> Florian: I have created a MR
> https://gitlab.com/slepc/slepc/-/merge_requests/149
> Let me know if it fits your needs.
>
> Jose
>
>
> > El 15 feb 2021, a las 18:44, Jose E. Roman <jroman at dsic.upv.es>
> escribi?:
> >
> >
> >
> >> El 15 feb 2021, a las 14:53, Matthew Knepley <knepley at gmail.com>
> escribi?:
> >>
> >> On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman <jroman at dsic.upv.es>
> wrote:
> >> I will think about the viability of adding an interface function to
> pass the preconditioner matrix.
> >>
> >> Regarding the question about the B-orthogonality of computed vectors,
> in the symmetric solver the B-orthogonality is enforced during the
> computation, so you have guarantee that the computed vectors satisfy it.
> But if solved as non-symetric, the computed vectors may depart from
> B-orthogonality, unless the tolerance is very small.
> >>
> >> Yes, the vectors I generate are not B-orthogonal.
> >>
> >> Jose, do you think there is a way to reformulate what I am doing to use
> the symmetric solver, even if we only have the action of B?
> >
> > Yes, you can do the following:
> >
> >  ierr = EPSSetOperators(eps,S,NULL);CHKERRQ(ierr);   // S is your shell
> matrix A^{-1}*B
> >  ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr);  // symmetric
> problem though S is not symmetric
> >  ierr = EPSSetFromOptions(eps);CHKERRQ(ierr);
> >  ierr = EPSSetUp(eps);CHKERRQ(ierr);   // note explicitly calling setup
> here
> >  ierr = EPSGetBV(eps,&bv);CHKERRQ(ierr);
> >  ierr = BVSetMatrix(bv,B,PETSC_FALSE);CHKERRQ(ierr);    // replace
> solver's inner product
> >  ierr = EPSSolve(eps);CHKERRQ(ierr);
> >
> > I have tried this with test1.c and it works. The computed eigenvectors
> should be B-orthogonal in this case.
> >
> > Jose
> >
> >
> >>
> >>  Thanks,
> >>
> >>     Matt
> >>
> >> Jose
> >>
> >>
> >>> El 14 feb 2021, a las 21:41, Barry Smith <bsmith at petsc.dev> escribi?:
> >>>
> >>>
> >>>  Florian,
> >>>
> >>>   I'm sorry I don't know the answers; I can only speculate. There is a
> STGetShift().
> >>>
> >>>   All I was saying is theoretically there could/should be such support
> in SLEPc.
> >>>
> >>>  Barry
> >>>
> >>>
> >>>> On Feb 13, 2021, at 6:43 PM, Florian Bruckner <e0425375 at gmail.com>
> wrote:
> >>>>
> >>>> Dear Barry,
> >>>> thank you for your clarification. What I wanted to say is that even
> if I could reset the KSP operators directly I would require to know which
> transformation ST applies in order to provide the preconditioning matrix
> for the correct operator.
> >>>> The more general solution would be that SLEPc provides the interface
> to pass the preconditioning matrix for A0 and ST applies the same
> transformations as for the operator.
> >>>>
> >>>> If you write "SLEPc could provide an interface", do you mean someone
> should implement it, or should it already be possible and I am not using it
> correctly?
> >>>> I wrote a small standalone example based on ex9.py from slepc4py,
> where i tried to use an operator.
> >>>>
> >>>> best wishes
> >>>> Florian
> >>>>
> >>>> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev> wrote:
> >>>>
> >>>>
> >>>>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com>
> wrote:
> >>>>>>
> >>>>>> Dear Jose, Dear Barry,
> >>>>>> thanks again for your reply. One final question about the B0
> orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but
> they are i*B0 orthogonal? or is there an issue with Matt's approach?
> >>>>>> For my problem I can show that eigenvalues fulfill an orthogonality
> relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This
> should be independent of the solving method, right?
> >>>>>>
> >>>>>> Regarding Barry's advice this is what I first tried:
> >>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >>>>>> st = es.getST()
> >>>>>> ksp = st.getKSP()
> >>>>>> ksp.setOperators(self.A0, self.P0)
> >>>>>>
> >>>>>> But it seems that the provided P0 is not used. Furthermore the
> interface is maybe a bit confusing if ST performs some transformation. In
> this case P0 needs to approximate A0^{-1}*B0 and not A0, right?
> >>>>>
> >>>>> No, you need to approximate (A0-sigma B0)^-1. If you have a null
> shift, which looks like it is the case, you end up with A0^-1.
> >>>>
> >>>>  Just trying to provide more clarity with the terms.
> >>>>
> >>>> If ST transforms the operator in the KSP to (A0-sigma B0) and you are
> providing the "sparse matrix from which the preconditioner is to be built"
> then you need to provide something that approximates (A0-sigma B0). Since
> the PC will use your matrix to construct a preconditioner that approximates
> the inverse of  (A0-sigma B0), you don't need to directly provide something
> that approximates (A0-sigma B0)^-1
> >>>>
> >>>> Yes, I would think SLEPc could provide an interface where it manages
> "the matrix from which to construct the preconditioner" and transforms that
> matrix just like the true matrix. To do it by hand you simply need to know
> what A0 and B0 are and which sigma ST has selected and then you can
> construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and
> modB0 are your "sparser approximations".
> >>>>
> >>>>  Barry
> >>>>
> >>>>
> >>>>>
> >>>>>> Nevertheless I think it would be the best solution if one could
> provide P0 (approx A0) and SLEPc derives the preconditioner from this.
> Would this be hard to implement?
> >>>>>
> >>>>> This is what Barry?s suggestion is implementing. Don?t know why it
> doesn?t work with your Python operator though.
> >>>>>
> >>>>> Thanks,
> >>>>> Pierre
> >>>>>
> >>>>>> best wishes
> >>>>>> Florian
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev>
> wrote:
> >>>>>>
> >>>>>>
> >>>>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> Dear Jose, Dear Matt,
> >>>>>>>
> >>>>>>> I needed some time to think about your answers.
> >>>>>>> If I understand correctly, the eigenmode solver internally uses
> A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP
> solver and a corresponding preconditioner.
> >>>>>>> What I would need is an interface to provide not only the system
> Matrix A0 (which is an operator), but also a preconditioning matrix (sparse
> approximation of the operator).
> >>>>>>> Unfortunately this interface is not available, right?
> >>>>>>
> >>>>>>   If SLEPc does not provide this directly it is still intended to
> be trivial to provide the "preconditioner matrix" (that is matrix from
> which the preconditioner is built). Just get the KSP from the ST object and
> use KSPSetOperators() to provide the "preconditioner matrix" .
> >>>>>>
> >>>>>>  Barry
> >>>>>>
> >>>>>>>
> >>>>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The
> operator uses a KSP with a proper PC internally. SLEPc would directly get
> A0^{-1}*B0 and solve a standard eigenvalue problem with this modified
> operator. Did I understand this correctly?
> >>>>>>>
> >>>>>>> I have two further points, which I did not mention yet: the matrix
> B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am
> using Firedrake to set up the PETSc system matrices A0, i*B0 (which is
> real). Then I convert them into ScipyLinearOperators and use
> scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues.
> Minv=A0^-1 is also solving within scipy using a preconditioned gmres.
> Advantage of this setup is that the imaginary B0 can be handled efficiently
> and also the post-processing of the eigenvectors (which requires complex
> arithmetics) is simplified.
> >>>>>>>
> >>>>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too
> complicated and is not very flexible.
> >>>>>>> If I would use Matt's approach, could I then simply switch between
> multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to
> the use of matshell?
> >>>>>>> Is there a solution for the imaginary B0, or do I have to use the
> non-hermitian methods? Is this a large performance drawback?
> >>>>>>>
> >>>>>>> thanks again,
> >>>>>>> and best wishes
> >>>>>>> Florian
> >>>>>>>
> >>>>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es>
> wrote:
> >>>>>>> The problem can be written as A0*v=omega*B0*v and you want the
> eigenvalues omega closest to zero. If the matrices were explicitly
> available, you would do shift-and-invert with target=0, that is
> >>>>>>>
> >>>>>>>  (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
> >>>>>>>
> >>>>>>>  A0^{-1}*B0*v=theta*v
> >>>>>>>
> >>>>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
> >>>>>>>
> >>>>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of
> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need?
> EPS_SMALLEST_REAL will give slow convergence.
> >>>>>>>
> >>>>>>> Florian: I would not recommend setting the KSP matrices directly,
> it may produce strange side-effects. We should have an interface function
> to pass this matrix. Currently there is STPrecondSetMatForPC() but it has
> two problems: (1) it is intended for STPRECOND, so cannot be used with
> Krylov-Schur, and (2) it is not currently available in the python interface.
> >>>>>>>
> >>>>>>> The approach used by Matt is a workaround that does not use ST, so
> you can handle linear solves with a KSP of your own.
> >>>>>>>
> >>>>>>> As an alternative, since your problem is symmetric, you could try
> LOBPCG, assuming that the leftmost eigenvalues are those that you want
> (e.g. if all eigenvalues are non-negative). In that case you could use
> STPrecondSetMatForPC(), but the remaining issue is calling it from python.
> >>>>>>>
> >>>>>>> If you are using the git repo, I could add the relevant code.
> >>>>>>>
> >>>>>>> Jose
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com>
> escribi?:
> >>>>>>>>
> >>>>>>>> On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <
> e0425375 at gmail.com> wrote:
> >>>>>>>> Dear PETSc / SLEPc Users,
> >>>>>>>>
> >>>>>>>> my question is very similar to the one posted here:
> >>>>>>>>
> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
> >>>>>>>>
> >>>>>>>> The eigensystem I would like to solve looks like:
> >>>>>>>> B0 v = 1/omega A0 v
> >>>>>>>> B0 and A0 are both hermitian, A0 is positive definite, but only
> given as a linear operator (matshell). I am looking for the largest
> eigenvalues (=smallest omega).
> >>>>>>>>
> >>>>>>>> I also have a sparse approximation P0 of the A0 operator, which i
> would like to use as precondtioner, using something like this:
> >>>>>>>>
> >>>>>>>>        es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >>>>>>>>        st = es.getST()
> >>>>>>>>        ksp = st.getKSP()
> >>>>>>>>        ksp.setOperators(self.A0, self.P0)
> >>>>>>>>
> >>>>>>>> Unfortunately PETSc still complains that it cannot create a
> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but
> A0.type == 'python').
> >>>>>>>> By the way, should P0 be an approximation of A0 or does it have
> to include B0?
> >>>>>>>>
> >>>>>>>> Right now I am using the krylov-schur method. Are there any
> alternatives if A0 is only given as an operator?
> >>>>>>>>
> >>>>>>>> Jose can correct me if I say something wrong.
> >>>>>>>>
> >>>>>>>> When I did this, I made a shell operator for the action of
> A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0
> preconditioning matrix, and
> >>>>>>>> then handed that to EPS. You can see me do it here:
> >>>>>>>>
> >>>>>>>>
> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
> >>>>>>>>
> >>>>>>>> I had a hard time getting the embedded solver to work the way I
> wanted, but maybe that is the better way.
> >>>>>>>>
> >>>>>>>>  Thanks,
> >>>>>>>>
> >>>>>>>>     Matt
> >>>>>>>>
> >>>>>>>> thanks for any advice
> >>>>>>>> best wishes
> >>>>>>>> Florian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> >>>>>>>> -- Norbert Wiener
> >>>>>>>>
> >>>>>>>> https://www.cse.buffalo.edu/~knepley/
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>> <test.py>
> >>>
> >>
> >>
> >>
> >> --
> >> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> >> -- Norbert Wiener
> >>
> >> https://www.cse.buffalo.edu/~knepley/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/8b7a6465/attachment-0001.html>

From knepley at gmail.com  Thu Feb 18 06:10:17 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 18 Feb 2021 07:10:17 -0500
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
	<87y2fmbpw2.fsf@jedbrown.org>
	<641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
	<CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
	<50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no>
Message-ID: <CAMYG4Gkz3M-F9pJKNWj95TCTzConw5PDakdUiTNLk51EEfWQrQ@mail.gmail.com>

On Thu, Feb 18, 2021 at 3:09 AM Roland Richter <roland.richter at ntnu.no>
wrote:

> Hei,
>
> that was the reason for increased run times. When removing #pragma omp
> parallel for, my loop took ~18 seconds. When changing it to #pragma omp
> parallel for num_threads(2) or #pragma omp parallel for num_threads(4) (on
> a i7-6700), the loop took ~16 s, but when increasing it to #pragma omp
> parallel for num_threads(8), the loop took 28 s.
>
> Editorial: This is a reason I think OpenMP is inappropriate as a  tool for
parallel computing (many people disagree). It makes resource management
difficult for the user and impossible for a library.

  Thanks,

     Matt

> Regards,
>
> Roland
> Am 17.02.21 um 18:51 schrieb Matthew Knepley:
>
> Jed, is it possible that this is an oversubscription penalty from bad
> OpenMP settings? <said by a person who knows less about OpenMP than
> cuneiform>
>
>   Thanks,
>
>      Matt
>
> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter <roland.richter at ntnu.no>
> wrote:
>
>> My PetscScalar is complex double (i.e. even higher penalty), but my
>> matrix has a size of 8kk elements, so that should not an issue.
>> Regards,
>> Roland
>> ------------------------------
>> *Von:* Jed Brown <jed at jedbrown.org>
>> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49
>> *An:* Roland Richter; PETSc
>> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in
>> performance drop and wrong results
>>
>> Roland Richter <roland.richter at ntnu.no> writes:
>>
>> > Hei,
>> >
>> > I replaced the linking line with
>> >
>> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx  -march=native -fopenmp-simd
>> > -DMKL_LP64 -m64
>> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
>> > bin/armadillo_with_PETSc
>> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
>> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
>> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
>> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
>> > /opt/boost/lib/libboost_filesystem.so.1.72.0
>> > /opt/boost/lib/libboost_mpi.so.1.72.0
>> > /opt/boost/lib/libboost_program_options.so.1.72.0
>> > /opt/boost/lib/libboost_serialization.so.1.72.0
>> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
>> > /opt/petsc_release/lib/libpetsc.so
>> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
>> > /
>> >
>> > and now the results are correct. Nevertheless, when comparing the loop
>> > in line 26-28 in file test_scaling.cpp
>> >
>> > /#pragma omp parallel for//
>> > //    for(int i = 0; i < r_0 * r_1; ++i)//
>> > //        *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/
>> >
>> > the version without /#pragma omp parallel/ for is significantly faster
>> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
>> > still such a big difference?
>>
>> Sounds like you're using a profile to attribute time? Each `omp parallel`
>> region incurs a cost ranging from about a microsecond to 10 or more
>> microseconds depending on architecture, number of threads, and OpenMP
>> implementation. Your loop (for double precision) operates at around 8
>> entries per clock cycle (depending on architecture) if the operands are in
>> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off
>> the cost of `omp parallel`.
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/be9f1eee/attachment.html>

From jroman at dsic.upv.es  Thu Feb 18 06:25:32 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Thu, 18 Feb 2021 13:25:32 +0100
Subject: [petsc-users] using preconditioner with SLEPc
In-Reply-To: <CAEc0zDqNiT_vBSSB-MEjPiiTK+yG+-9VpHZTpsOaG-+fBHs9RQ@mail.gmail.com>
References: <CAEc0zDqtO6+dH+ih1mN8pxUZB7d_8PxohH7v0E8QiVKg+ahKBQ@mail.gmail.com>
	<CAMYG4G=k6smDyHorq=m0vRqhRUA3Qh3uaTiTSON172NQjZZ6Vw@mail.gmail.com>
	<DB9F8F9D-561B-4AC8-B771-F5FD04B73D25@dsic.upv.es>
	<CAEc0zDoJarRyBZKOCP_FjsqgVTEH44cyGe2RQJF8URfJPrgK6Q@mail.gmail.com>
	<7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev>
	<CAEc0zDrxX2LWbH8kgSV_mhdSaUyf6CVw0-LQhO=_YNUY62qFRA@mail.gmail.com>
	<B783A843-DC14-4959-8D49-59C0BDDD954E@joliv.et>
	<119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev>
	<CAEc0zDpCR212_cx93aF0iahFEAQEYo3C19F=FSA928z3q7hpLw@mail.gmail.com>
	<6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev>
	<46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es>
	<CAMYG4G=nN79akOKMXYS-YXesTj1pZq-br6C_4GA29Vm0zFAwGQ@mail.gmail.com>
	<80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es>
	<BCA76786-A7C6-4282-A230-98ED85D653DC@dsic.upv.es>
	<CAEc0zDqNiT_vBSSB-MEjPiiTK+yG+-9VpHZTpsOaG-+fBHs9RQ@mail.gmail.com>
Message-ID: <CFE3DD3C-8C73-420B-BC85-E9A679A33833@dsic.upv.es>

Yes, it is an approximation of A-sigma*B, doing a different thing would be too disruptive.
I will merge into master what is currently in the merge request.
Jose


> El 18 feb 2021, a las 13:00, Florian Bruckner <e0425375 at gmail.com> escribi?:
> 
> Dear Jose,
> thanks for your work. I just looked over the code, but I didn't have time to implement our solver, yet. 
> If I understand the code correctly, it allows to set a precond-matrix which should approximate A-sigma*B.
> 
> I will try to get our code running in the next few weeks. From user perspective it would maybe simplify things if approximations for A as well as B are given, since this would hide the internal ST transformations.
> 
> best wishes
> Florian 
> 
> On Tue, Feb 16, 2021 at 8:54 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
> Florian: I have created a MR https://gitlab.com/slepc/slepc/-/merge_requests/149
> Let me know if it fits your needs.
> 
> Jose
> 
> 
> > El 15 feb 2021, a las 18:44, Jose E. Roman <jroman at dsic.upv.es> escribi?:
> > 
> > 
> > 
> >> El 15 feb 2021, a las 14:53, Matthew Knepley <knepley at gmail.com> escribi?:
> >> 
> >> On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman <jroman at dsic.upv.es> wrote:
> >> I will think about the viability of adding an interface function to pass the preconditioner matrix.
> >> 
> >> Regarding the question about the B-orthogonality of computed vectors, in the symmetric solver the B-orthogonality is enforced during the computation, so you have guarantee that the computed vectors satisfy it. But if solved as non-symetric, the computed vectors may depart from B-orthogonality, unless the tolerance is very small.
> >> 
> >> Yes, the vectors I generate are not B-orthogonal.
> >> 
> >> Jose, do you think there is a way to reformulate what I am doing to use the symmetric solver, even if we only have the action of B?
> > 
> > Yes, you can do the following:
> > 
> >  ierr = EPSSetOperators(eps,S,NULL);CHKERRQ(ierr);   // S is your shell matrix A^{-1}*B
> >  ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr);  // symmetric problem though S is not symmetric
> >  ierr = EPSSetFromOptions(eps);CHKERRQ(ierr);
> >  ierr = EPSSetUp(eps);CHKERRQ(ierr);   // note explicitly calling setup here
> >  ierr = EPSGetBV(eps,&bv);CHKERRQ(ierr);
> >  ierr = BVSetMatrix(bv,B,PETSC_FALSE);CHKERRQ(ierr);    // replace solver's inner product
> >  ierr = EPSSolve(eps);CHKERRQ(ierr);
> > 
> > I have tried this with test1.c and it works. The computed eigenvectors should be B-orthogonal in this case.
> > 
> > Jose
> > 
> > 
> >> 
> >>  Thanks,
> >> 
> >>     Matt
> >> 
> >> Jose
> >> 
> >> 
> >>> El 14 feb 2021, a las 21:41, Barry Smith <bsmith at petsc.dev> escribi?:
> >>> 
> >>> 
> >>>  Florian,
> >>> 
> >>>   I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). 
> >>> 
> >>>   All I was saying is theoretically there could/should be such support in SLEPc.
> >>> 
> >>>  Barry
> >>> 
> >>> 
> >>>> On Feb 13, 2021, at 6:43 PM, Florian Bruckner <e0425375 at gmail.com> wrote:
> >>>> 
> >>>> Dear Barry,
> >>>> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator.
> >>>> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator.
> >>>> 
> >>>> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly?
> >>>> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator.
> >>>> 
> >>>> best wishes
> >>>> Florian
> >>>> 
> >>>> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev> wrote:
> >>>> 
> >>>> 
> >>>>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet <pierre at joliv.et> wrote:
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
> >>>>>> 
> >>>>>> Dear Jose, Dear Barry,
> >>>>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach?
> >>>>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right?
> >>>>>> 
> >>>>>> Regarding Barry's advice this is what I first tried:
> >>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >>>>>> st = es.getST()
> >>>>>> ksp = st.getKSP()
> >>>>>> ksp.setOperators(self.A0, self.P0)
> >>>>>> 
> >>>>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right?
> >>>>> 
> >>>>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1.
> >>>> 
> >>>>  Just trying to provide more clarity with the terms.
> >>>> 
> >>>> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of  (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1
> >>>> 
> >>>> Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations".
> >>>> 
> >>>>  Barry
> >>>> 
> >>>> 
> >>>>> 
> >>>>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement?
> >>>>> 
> >>>>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though.
> >>>>> 
> >>>>> Thanks,
> >>>>> Pierre
> >>>>> 
> >>>>>> best wishes
> >>>>>> Florian
> >>>>>> 
> >>>>>> 
> >>>>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith <bsmith at petsc.dev> wrote:
> >>>>>> 
> >>>>>> 
> >>>>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner <e0425375 at gmail.com> wrote:
> >>>>>>> 
> >>>>>>> Dear Jose, Dear Matt, 
> >>>>>>> 
> >>>>>>> I needed some time to think about your answers. 
> >>>>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner.
> >>>>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator).
> >>>>>>> Unfortunately this interface is not available, right?
> >>>>>> 
> >>>>>>   If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . 
> >>>>>> 
> >>>>>>  Barry
> >>>>>> 
> >>>>>>> 
> >>>>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly?
> >>>>>>> 
> >>>>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. 
> >>>>>>> 
> >>>>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. 
> >>>>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell?
> >>>>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback?
> >>>>>>> 
> >>>>>>> thanks again,
> >>>>>>> and best wishes
> >>>>>>> Florian
> >>>>>>> 
> >>>>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman <jroman at dsic.upv.es> wrote:
> >>>>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is
> >>>>>>> 
> >>>>>>>  (A0-sigma*B0)^{-1}*B0*v=theta*v    for sigma=0, that is
> >>>>>>> 
> >>>>>>>  A0^{-1}*B0*v=theta*v
> >>>>>>> 
> >>>>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega.
> >>>>>>> 
> >>>>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence.
> >>>>>>> 
> >>>>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface.
> >>>>>>> 
> >>>>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own.
> >>>>>>> 
> >>>>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python.
> >>>>>>> 
> >>>>>>> If you are using the git repo, I could add the relevant code.
> >>>>>>> 
> >>>>>>> Jose
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> El 8 feb 2021, a las 14:22, Matthew Knepley <knepley at gmail.com> escribi?:
> >>>>>>>> 
> >>>>>>>> On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner <e0425375 at gmail.com> wrote:
> >>>>>>>> Dear PETSc / SLEPc Users,
> >>>>>>>> 
> >>>>>>>> my question is very similar to the one posted here: 
> >>>>>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html
> >>>>>>>> 
> >>>>>>>> The eigensystem I would like to solve looks like:
> >>>>>>>> B0 v = 1/omega A0 v
> >>>>>>>> B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). 
> >>>>>>>> 
> >>>>>>>> I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this:
> >>>>>>>> 
> >>>>>>>>        es = SLEPc.EPS().create(comm=fd.COMM_WORLD)
> >>>>>>>>        st = es.getST()
> >>>>>>>>        ksp = st.getKSP()
> >>>>>>>>        ksp.setOperators(self.A0, self.P0)
> >>>>>>>> 
> >>>>>>>> Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). 
> >>>>>>>> By the way, should P0 be an approximation of A0 or does it have to include B0?
> >>>>>>>> 
> >>>>>>>> Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator?
> >>>>>>>> 
> >>>>>>>> Jose can correct me if I say something wrong.
> >>>>>>>> 
> >>>>>>>> When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and
> >>>>>>>> then handed that to EPS. You can see me do it here:
> >>>>>>>> 
> >>>>>>>>  https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123
> >>>>>>>> 
> >>>>>>>> I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way.
> >>>>>>>> 
> >>>>>>>>  Thanks,
> >>>>>>>> 
> >>>>>>>>     Matt
> >>>>>>>> 
> >>>>>>>> thanks for any advice
> >>>>>>>> best wishes
> >>>>>>>> Florian
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> -- 
> >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> >>>>>>>> -- Norbert Wiener
> >>>>>>>> 
> >>>>>>>> https://www.cse.buffalo.edu/~knepley/
> >>>>>>> 
> >>>>>> 
> >>>>> 
> >>>> 
> >>>> <test.py>
> >>> 
> >> 
> >> 
> >> 
> >> -- 
> >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> >> -- Norbert Wiener
> >> 
> >> https://www.cse.buffalo.edu/~knepley/
> 


From jed at jedbrown.org  Thu Feb 18 10:01:18 2021
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 18 Feb 2021 09:01:18 -0700
Subject: [petsc-users] RSE and Postdoc openings at CU Boulder
Message-ID: <87pn0x9xgx.fsf@jedbrown.org>

My research group has openings for a Research Software Engineer and a Postdoc. Details and application links below; feel free to email me with questions. 


## Research Software Engineer

CU Boulder?s PSAAP Multidisciplinary Simulation Center for Micromorphic Multiphysics Porous and Particulate Materials Simulations Within Exascale Computing Workflows has an opening for a *Research Software Engineer* to co-lead development of robust, extensible open source software for extreme-scale simulation of large-deformation composite poro-elasto-visco-plastic media across a broad range of regimes with experimental validation and coordination with micromorphic multiscale models.

Successful applicants will have strong written and verbal communication skills, and interest in working with an interdisciplinary team to apply the following to real-world problems:

* collaborative software development and devops (Git, continuous integration, etc.);
* maintainable, high-performance programming techniques for CPUs and GPUs;
* finite element and material-point discretizations;
* computational mechanics/inelasticity;
* parallel algebraic solvers such as PETSc; and
* scalable data-intensive computing.

This position can start immediately and is remote-friendly, especially during the pandemic.

https://jobs.colorado.edu/jobs/JobDetail/Research-Associate/28703


## Postdoc

We also have an immediate opening for a *Postdoc* to conduct research in collaboration with the DOE Exascale Computing Project?s co-design Center for Efficient Exascale Discretization (CEED) on the development of robust and efficient methods for high order/compatible PDE discretization and multilevel solvers, including deployment in open source libraries. The project is especially interested in strategies to provide performance portability on emerging architectures and novel parallelization techniques to improve time to solution in the strong scaling limit. The methods will be applied in a variety of applications areas including sustainable energy and geophysics.

Successful applicants will have strong written and verbal communication skills to collaborate with a distributed inter-disciplinary team and disseminate results via publications and presentations, as well as an interest in research and development of high-quality community software infrastructure in areas including, but not limited to:

* element-based PDE discretization;
* high-performance computing on emerging architectures, including CPU and GPUs;
* scalable algebraic solvers;
* applications in fluid and solid mechanics; and
* data-intensive PDE workflows.

This position can start immediately and is remote-friendly, especially during the pandemic.

https://jobs.colorado.edu/jobs/JobDetail/PostDoctoral-Associate/28691


The University of Colorado Boulder is committed to building a culturally diverse community of faculty, staff, and students dedicated to contributing to an inclusive campus environment. We are an Equal Opportunity employer. We offer a competitive salary and a comprehensive benefits package.

From heepark at sandia.gov  Thu Feb 18 12:35:20 2021
From: heepark at sandia.gov (Park, Heeho)
Date: Thu, 18 Feb 2021 18:35:20 +0000
Subject: [petsc-users] [EXTERNAL] Re:  insufficient virtual memory?
In-Reply-To: <263912AE-3062-43E3-BBF3-7B3E4703AB0C@petsc.dev>
References: <C345DFB6-E7F3-4A4A-967F-396E4EB7D657@sandia.gov>
	<263912AE-3062-43E3-BBF3-7B3E4703AB0C@petsc.dev>
Message-ID: <CCE48A3A-A218-42C9-AA9A-FAA92DC3C728@sandia.gov>

Thank you Barry. I will look further into it.

- Heeho Daniel Park

From: Barry Smith <bsmith at petsc.dev>
Date: Wednesday, February 17, 2021 at 6:57 PM
To: "Park, Heeho" <heepark at sandia.gov>
Cc: "petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
Subject: [EXTERNAL] Re: [petsc-users] insufficient virtual memory?


   PETSc gets almost all its memory using the C malloc system calls so it is unlikely that this Fortran error message comes from PETSc code. My guess is that you have some Fortran arrays declared somewhere in your code that are large and require memory that is not available.

  Barry


On Feb 17, 2021, at 7:23 PM, Park, Heeho via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:

Hi PETSc developers,

Have you seen this error message?

      forrtl: severe (41): insufficient virtual memory

We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is failing with the error message on our HPC systems.
Ironically, it runs on our laptop (super slow.)

      type: seqbaij
      rows=46251272, cols=46251272
      total: nonzeros=323046210, allocated nonzeros=323046210
      total number of mallocs used during MatSetValues calls=0
          block size is 1

Does anyone have experience encountering this problem?

Thanks,

Heeho Daniel Park

! ------------------------------------ !
Sandia National Laboratories
Org: 08844, R&D
Work: 505-844-1319
! ------------------------------------ !

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/6d71bf1d/attachment.html>

From bsmith at petsc.dev  Thu Feb 18 18:15:31 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 18 Feb 2021 18:15:31 -0600
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <CAMYG4Gkz3M-F9pJKNWj95TCTzConw5PDakdUiTNLk51EEfWQrQ@mail.gmail.com>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
	<87y2fmbpw2.fsf@jedbrown.org>
	<641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
	<CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
	<50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no>
	<CAMYG4Gkz3M-F9pJKNWj95TCTzConw5PDakdUiTNLk51EEfWQrQ@mail.gmail.com>
Message-ID: <CA0E47E0-4467-4FDC-A93E-8A32C226B457@petsc.dev>


> On Feb 18, 2021, at 6:10 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Thu, Feb 18, 2021 at 3:09 AM Roland Richter <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
> Hei,
> 
> that was the reason for increased run times. When removing #pragma omp parallel for, my loop took ~18 seconds. When changing it to #pragma omp parallel for num_threads(2) or #pragma omp parallel for num_threads(4) (on a i7-6700), the loop took ~16 s, but when increasing it to #pragma omp parallel for num_threads(8), the loop took 28 s.
> 
> 
> Editorial: This is a reason I think OpenMP is inappropriate as a  tool for parallel computing (many people disagree). It makes resource management
> difficult for the user and impossible for a library.

   It is possible to control these things properly with modern OpenMP APIs but, like MPI implementations, this can require some mucking around a beginner would not know about and the default settings can be terrible. MPI implementations are not better, their default bindings are generally horrible. 

  Barry

> 
>   Thanks,
> 
>      Matt
> Regards,
> 
> Roland
> 
> Am 17.02.21 um 18:51 schrieb Matthew Knepley:
>> Jed, is it possible that this is an oversubscription penalty from bad OpenMP settings? <said by a person who knows less about OpenMP than cuneiform>
>> 
>>   Thanks,
>> 
>>      Matt
>> 
>> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>> My PetscScalar is complex double (i.e. even higher penalty), but my matrix has a size of 8kk elements, so that should not an issue.
>> Regards,
>> Roland
>> Von: Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>>
>> Gesendet: Mittwoch, 17. Februar 2021 17:49:49
>> An: Roland Richter; PETSc
>> Betreff: Re: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results
>>  
>> Roland Richter <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> writes:
>> 
>> > Hei,
>> >
>> > I replaced the linking line with
>> >
>> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx  -march=native -fopenmp-simd
>> > -DMKL_LP64 -m64
>> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
>> > bin/armadillo_with_PETSc 
>> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
>> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran 
>> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
>> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
>> > /opt/boost/lib/libboost_filesystem.so.1.72.0
>> > /opt/boost/lib/libboost_mpi.so.1.72.0
>> > /opt/boost/lib/libboost_program_options.so.1.72.0
>> > /opt/boost/lib/libboost_serialization.so.1.72.0
>> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
>> > /opt/petsc_release/lib/libpetsc.so
>> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
>> > /
>> >
>> > and now the results are correct. Nevertheless, when comparing the loop
>> > in line 26-28 in file test_scaling.cpp
>> >
>> > /#pragma omp parallel for//
>> > //    for(int i = 0; i < r_0 * r_1; ++i)//
>> > //        *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/
>> >
>> > the version without /#pragma omp parallel/ for is significantly faster
>> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
>> > still such a big difference?
>> 
>> Sounds like you're using a profile to attribute time? Each `omp parallel` region incurs a cost ranging from about a microsecond to 10 or more microseconds depending on architecture, number of threads, and OpenMP implementation. Your loop (for double precision) operates at around 8 entries per clock cycle (depending on architecture) if the operands are in cache so the loop size r_0 * r_1 should be at least 10000 just to pay off the cost of `omp parallel`.
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/240f5ad0/attachment.html>

From knepley at gmail.com  Thu Feb 18 20:03:21 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 18 Feb 2021 21:03:21 -0500
Subject: [petsc-users] Explicit linking to OpenMP results in performance
 drop and wrong results
In-Reply-To: <CA0E47E0-4467-4FDC-A93E-8A32C226B457@petsc.dev>
References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no>
	<874kiad69u.fsf@jedbrown.org>
	<3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>
	<87y2fmbpw2.fsf@jedbrown.org>
	<641b1bcbfd2741d58cb8d21960a720ca@ntnu.no>
	<CAMYG4Gmej=YDgdPPb8P60-oViESJO32=VE6iAobG4qHuy7=L8g@mail.gmail.com>
	<50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no>
	<CAMYG4Gkz3M-F9pJKNWj95TCTzConw5PDakdUiTNLk51EEfWQrQ@mail.gmail.com>
	<CA0E47E0-4467-4FDC-A93E-8A32C226B457@petsc.dev>
Message-ID: <CAMYG4Gmq88HNzgCBW_+3e7W9LNzWf_5c2cR9SA+r7nG3kiuGEQ@mail.gmail.com>

On Thu, Feb 18, 2021 at 7:15 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> On Feb 18, 2021, at 6:10 AM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Thu, Feb 18, 2021 at 3:09 AM Roland Richter <roland.richter at ntnu.no>
> wrote:
>
>> Hei,
>>
>> that was the reason for increased run times. When removing #pragma omp
>> parallel for, my loop took ~18 seconds. When changing it to #pragma omp
>> parallel for num_threads(2) or #pragma omp parallel for num_threads(4) (on
>> a i7-6700), the loop took ~16 s, but when increasing it to #pragma omp
>> parallel for num_threads(8), the loop took 28 s.
>>
>> Editorial: This is a reason I think OpenMP is inappropriate as a  tool
> for parallel computing (many people disagree). It makes resource management
> difficult for the user and impossible for a library.
>
>
>    It is possible to control these things properly with modern OpenMP APIs
> but, like MPI implementations, this can require some mucking around a
> beginner would not know about and the default settings can be terrible. MPI
> implementations are not better, their default bindings are generally
> horrible.
>

MPI allows the library to understand what resources are available and used.
Last time we looked at it, OpenMP does not have such
a context object that gets passed into the library (comm). The user could
construct one, but then the "usability" of OpenMP fades away.

  Matt


>   Barry
>
>
>   Thanks,
>
>      Matt
>
>> Regards,
>>
>> Roland
>> Am 17.02.21 um 18:51 schrieb Matthew Knepley:
>>
>> Jed, is it possible that this is an oversubscription penalty from bad
>> OpenMP settings? <said by a person who knows less about OpenMP than
>> cuneiform>
>>
>>   Thanks,
>>
>>      Matt
>>
>> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter <roland.richter at ntnu.no>
>> wrote:
>>
>>> My PetscScalar is complex double (i.e. even higher penalty), but my
>>> matrix has a size of 8kk elements, so that should not an issue.
>>> Regards,
>>> Roland
>>> ------------------------------
>>> *Von:* Jed Brown <jed at jedbrown.org>
>>> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49
>>> *An:* Roland Richter; PETSc
>>> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in
>>> performance drop and wrong results
>>>
>>> Roland Richter <roland.richter at ntnu.no> writes:
>>>
>>> > Hei,
>>> >
>>> > I replaced the linking line with
>>> >
>>> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx  -march=native -fopenmp-simd
>>> > -DMKL_LP64 -m64
>>> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o
>>> > bin/armadillo_with_PETSc
>>> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib
>>> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran
>>> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64
>>> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
>>> > /opt/boost/lib/libboost_filesystem.so.1.72.0
>>> > /opt/boost/lib/libboost_mpi.so.1.72.0
>>> > /opt/boost/lib/libboost_program_options.so.1.72.0
>>> > /opt/boost/lib/libboost_serialization.so.1.72.0
>>> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so
>>> > /opt/petsc_release/lib/libpetsc.so
>>> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so
>>> > /
>>> >
>>> > and now the results are correct. Nevertheless, when comparing the loop
>>> > in line 26-28 in file test_scaling.cpp
>>> >
>>> > /#pragma omp parallel for//
>>> > //    for(int i = 0; i < r_0 * r_1; ++i)//
>>> > //        *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/
>>> >
>>> > the version without /#pragma omp parallel/ for is significantly faster
>>> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there
>>> > still such a big difference?
>>>
>>> Sounds like you're using a profile to attribute time? Each `omp
>>> parallel` region incurs a cost ranging from about a microsecond to 10 or
>>> more microseconds depending on architecture, number of threads, and OpenMP
>>> implementation. Your loop (for double precision) operates at around 8
>>> entries per clock cycle (depending on architecture) if the operands are in
>>> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off
>>> the cost of `omp parallel`.
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210218/acc80a27/attachment-0001.html>

From fabien.vergnet at sorbonne-universite.fr  Mon Feb 22 05:28:44 2021
From: fabien.vergnet at sorbonne-universite.fr (Fabien Vergnet)
Date: Mon, 22 Feb 2021 12:28:44 +0100
Subject: [petsc-users] Get the vertices composing the cells of a DMPlex
 Submesh
Message-ID: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr>

Dear PETSc community,

Thank you for your amazing work. I discovered PETSc recently and I need your help for a project.

As a training, I would like to assemble the Finite Element Matrix for the Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh file and I create a Submesh with

DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm);

In order to assemble my Finite Element Matrix, I need to iterate over the cells of the mesh (which are triangles) and identify the vertices composing each cell.

My question is the following : how can I get, for each cell, the vertices composing the cell ?

I have tried to uninterpolate the subdm with DMPlexUninterpolate (with the objective to get the vertices from DMPlexGetCone) but it does not seem to work for a Submesh since I get the following error:

----------

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Not for partially interpolated meshes
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 
[0]PETSC ERROR: Configure options --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS= FFLAGS= CXXFLAGS= --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 --with-metis=1 --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 --with-hypre=1 --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 --with-parmetis=1 --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 --with-libjpeg=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so --with-hdf5=1 --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so --with-zlib=1
[0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c
[0]PETSC ERROR: #2 assemble_mass() line 59 in /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp

----------

Attached are a minimal working example, a mesh file and a makefile.

Any ideas or suggestions are more than welcome !

Regards,
Fabien


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/7723df09/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mwe.cpp
Type: application/applefile
Size: 67 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/7723df09/attachment.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/7723df09/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mesh.msh
Type: application/applefile
Size: 68 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/7723df09/attachment-0001.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/7723df09/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile
Type: application/applefile
Size: 68 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/7723df09/attachment-0002.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/7723df09/attachment-0003.html>

From stefano.zampini at gmail.com  Mon Feb 22 05:48:59 2021
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Mon, 22 Feb 2021 14:48:59 +0300
Subject: [petsc-users] Get the vertices composing the cells of a DMPlex
 Submesh
In-Reply-To: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr>
References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr>
Message-ID: <CAGPUisjSixY4TB-skTs0FFNfajQ1g-mvPyEh2guYfWBXGF9M9w@mail.gmail.com>

The plex way is to use the transitive closure
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexGetTransitiveClosure.html
and filter out points you are not interested in.
See for example
https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L828
and
https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L833

Il giorno lun 22 feb 2021 alle ore 14:28 Fabien Vergnet <
fabien.vergnet at sorbonne-universite.fr> ha scritto:

> Dear PETSc community,
>
> Thank you for your amazing work. I discovered PETSc recently and I need
> your help for a project.
>
> As a training, I would like to assemble the Finite Element Matrix for the
> Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh
> file and I create a Submesh with
>
> DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm);
>
> In order to assemble my Finite Element Matrix, I need to iterate over the
> cells of the mesh (which are triangles) and identify the vertices
> composing each cell.
>
> My question is the following : how can I get, for each cell, the vertices
> composing the cell ?
>
> I have tried to uninterpolate the subdm with DMPlexUninterpolate (with
> the objective to get the vertices from DMPlexGetCone) but it does not
> seem to work for a Submesh since I get the following error:
>
> ----------
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Not for partially interpolated meshes
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021
> [0]PETSC ERROR: Configure options
> --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk
> --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0
> CFLAGS= FFLAGS= CXXFLAGS=
> --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc
> --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++
> --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90
> --with-precision=double --with-scalar-type=real --with-shared-libraries=1
> --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS=
> CXXOPTFLAGS=
> --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so
> --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0
> --with-metis=1
> --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2
> --with-hypre=1
> --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7
> --with-parmetis=1
> --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc
> --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0
> --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0
> --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0
> --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0
> --with-libjpeg=0 --with-cxx-dialect=C++11
> --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include
> --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a
> --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0
> --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include
> --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so
> --with-hdf5=1
> --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include
> --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so
> --with-zlib=1
> [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in
> /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c
> [0]PETSC ERROR: #2 assemble_mass() line 59 in
> /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp
>
> ----------
>
> Attached are a minimal working example, a mesh file and a makefile.
>
> Any ideas or suggestions are more than welcome !
>
> Regards,
> Fabien
>
>
>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/2bb4956a/attachment.html>

From fabien.vergnet at sorbonne-universite.fr  Mon Feb 22 06:09:12 2021
From: fabien.vergnet at sorbonne-universite.fr (Fabien Vergnet)
Date: Mon, 22 Feb 2021 13:09:12 +0100
Subject: [petsc-users] Get the vertices composing the cells of a DMPlex
 Submesh
In-Reply-To: <CAGPUisjSixY4TB-skTs0FFNfajQ1g-mvPyEh2guYfWBXGF9M9w@mail.gmail.com>
References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr>
	<CAGPUisjSixY4TB-skTs0FFNfajQ1g-mvPyEh2guYfWBXGF9M9w@mail.gmail.com>
Message-ID: <4496AC11-FCF5-4799-9F04-453F695AF04E@sorbonne-universite.fr>

Hi Stefano,

Thank you for your response.

Could you explain more what the output of DMPlexGetTransitiveClosure is ? For example for cell 0 I get the following array of size 14 for the closure:

0 0 370 -2 369 -2 368 -2 309 0 306 0 312 0 

but I do not understand the order of the points (center of the cell, middle of the edges, vertices ?). Also what the orientation means for each point ?

Regards,
Fabien

> Le 22 f?vr. 2021 ? 12:48, Stefano Zampini <stefano.zampini at gmail.com> a ?crit :
> 
> The plex way is to use the transitive closure https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexGetTransitiveClosure.html <https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexGetTransitiveClosure.html> and filter out points you are not interested in.
> See for example https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L828 <https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L828> and https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L833 <https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L833>
> Il giorno lun 22 feb 2021 alle ore 14:28 Fabien Vergnet <fabien.vergnet at sorbonne-universite.fr <mailto:fabien.vergnet at sorbonne-universite.fr>> ha scritto:
> Dear PETSc community,
> 
> Thank you for your amazing work. I discovered PETSc recently and I need your help for a project.
> 
> As a training, I would like to assemble the Finite Element Matrix for the Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh file and I create a Submesh with
> 
> DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm);
> 
> In order to assemble my Finite Element Matrix, I need to iterate over the cells of the mesh (which are triangles) and identify the vertices composing each cell.
> 
> My question is the following : how can I get, for each cell, the vertices composing the cell ?
> 
> I have tried to uninterpolate the subdm with DMPlexUninterpolate (with the objective to get the vertices from DMPlexGetCone) but it does not seem to work for a Submesh since I get the following error:
> 
> ----------
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Not for partially interpolated meshes
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 
> [0]PETSC ERROR: Configure options --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS= FFLAGS= CXXFLAGS= --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 --with-metis=1 --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 --with-hypre=1 --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 --with-parmetis=1 --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 --with-libjpeg=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so --with-hdf5=1 --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so --with-zlib=1
> [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c
> [0]PETSC ERROR: #2 assemble_mass() line 59 in /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp
> 
> ----------
> 
> Attached are a minimal working example, a mesh file and a makefile.
> 
> Any ideas or suggestions are more than welcome !
> 
> Regards,
> Fabien
> 
> 
> 
> 
> 
> -- 
> Stefano

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/15ab04cb/attachment-0001.html>

From stefano.zampini at gmail.com  Mon Feb 22 06:23:08 2021
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Mon, 22 Feb 2021 15:23:08 +0300
Subject: [petsc-users] Get the vertices composing the cells of a DMPlex
 Submesh
In-Reply-To: <4496AC11-FCF5-4799-9F04-453F695AF04E@sorbonne-universite.fr>
References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr>
	<CAGPUisjSixY4TB-skTs0FFNfajQ1g-mvPyEh2guYfWBXGF9M9w@mail.gmail.com>
	<4496AC11-FCF5-4799-9F04-453F695AF04E@sorbonne-universite.fr>
Message-ID: <CAGPUisjF6_eBUTk+5PfRaDSRcLV4+NbM8LLYeyJE0=DhO5fJ_Q@mail.gmail.com>

>From your output, it seems a triangle cell.

Reading  the documentation of DMPlexGetTransitiveClosure I have already
pointed you to: the output is interleaved between points (cell, edges and
vertices in this case) and their relative orientation wrt the point of
higher dimension.

Reading your output

0 0 -> cell number 0 (relative orientation is 0)

then the edges (all edges are points locally numbered from eStart to eEnd,
where DMPlexGetDepthStratum(dm,1,&eStart,&eEnd))

370 -2 369 -2 368 -2 -> cell 0 is made up by points 370, 369 and 368
(traversed in this specific order), and each edge must be traversed -2
(from second endpoint to first)

then the vertices (start and end of local per process numbering via
DMPlexGetDepthStratum(dm,0,&vStart,&vEnd))

309 0 306 0 312 0  -> vertices have no orientation.


Il giorno lun 22 feb 2021 alle ore 15:09 Fabien Vergnet <
fabien.vergnet at sorbonne-universite.fr> ha scritto:

> Hi Stefano,
>
> Thank you for your response.
>
> Could you explain more what the output of DMPlexGetTransitiveClosure is ?
> For example for cell 0 I get the following array of size 14 for the closure:
>
> 0 0 370 -2 369 -2 368 -2 309 0 306 0 312 0
>
> but I do not understand the order of the points (center of the cell,
> middle of the edges, vertices ?). Also what the orientation means for each
> point ?
>
> Regards,
> Fabien
>
> Le 22 f?vr. 2021 ? 12:48, Stefano Zampini <stefano.zampini at gmail.com> a
> ?crit :
>
> The plex way is to use the transitive closure
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexGetTransitiveClosure.html
>  and filter out points you are not interested in.
> See for example
> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L828
>  and
> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L833
>
> Il giorno lun 22 feb 2021 alle ore 14:28 Fabien Vergnet <
> fabien.vergnet at sorbonne-universite.fr> ha scritto:
>
>> Dear PETSc community,
>>
>> Thank you for your amazing work. I discovered PETSc recently and I need
>> your help for a project.
>>
>> As a training, I would like to assemble the Finite Element Matrix for the
>> Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh
>> file and I create a Submesh with
>>
>> DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm);
>>
>> In order to assemble my Finite Element Matrix, I need to iterate over the
>> cells of the mesh (which are triangles) and identify the vertices
>> composing each cell.
>>
>> My question is the following : how can I get, for each cell, the vertices
>> composing the cell ?
>>
>> I have tried to uninterpolate the subdm with DMPlexUninterpolate (with
>> the objective to get the vertices from DMPlexGetCone) but it does not
>> seem to work for a Submesh since I get the following error:
>>
>> ----------
>>
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Invalid argument
>> [0]PETSC ERROR: Not for partially interpolated meshes
>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
>> trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021
>> [0]PETSC ERROR: Configure options
>> --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk
>> --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0
>> CFLAGS= FFLAGS= CXXFLAGS=
>> --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc
>> --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++
>> --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90
>> --with-precision=double --with-scalar-type=real --with-shared-libraries=1
>> --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS=
>> CXXOPTFLAGS=
>> --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so
>> --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0
>> --with-metis=1
>> --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2
>> --with-hypre=1
>> --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7
>> --with-parmetis=1
>> --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc
>> --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0
>> --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0
>> --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0
>> --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0
>> --with-libjpeg=0 --with-cxx-dialect=C++11
>> --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include
>> --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a
>> --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0
>> --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include
>> --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so
>> --with-hdf5=1
>> --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include
>> --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so
>> --with-zlib=1
>> [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in
>> /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c
>> [0]PETSC ERROR: #2 assemble_mass() line 59 in
>> /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp
>>
>> ----------
>>
>> Attached are a minimal working example, a mesh file and a makefile.
>>
>> Any ideas or suggestions are more than welcome !
>>
>> Regards,
>> Fabien
>>
>>
>>
>>
>
> --
> Stefano
>
>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/a2528dbf/attachment.html>

From knepley at gmail.com  Mon Feb 22 07:36:10 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 22 Feb 2021 08:36:10 -0500
Subject: [petsc-users] Get the vertices composing the cells of a DMPlex
 Submesh
In-Reply-To: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr>
References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr>
Message-ID: <CAMYG4GnsGv-aXrNtZ317fQpEKdvjd0yHc7FkJFWYntV-HucoQQ@mail.gmail.com>

On Mon, Feb 22, 2021 at 6:28 AM Fabien Vergnet <
fabien.vergnet at sorbonne-universite.fr> wrote:

> Dear PETSc community,
>
> Thank you for your amazing work. I discovered PETSc recently and I need
> your help for a project.
>
> As a training, I would like to assemble the Finite Element Matrix for the
> Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh
> file and I create a Submesh with
>
> DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm);
>

I don't think you want this call. It is designed to pick out hypersurfaces.
If you just want part of a mesh, DMPlexFilter is better.


> In order to assemble my Finite Element Matrix, I need to iterate over the
> cells of the mesh (which are triangles) and identify the vertices
> composing each cell.
>
> My question is the following : how can I get, for each cell, the vertices
> composing the cell ?
>
> I have tried to uninterpolate the subdm with DMPlexUninterpolate (with
> the objective to get the vertices from DMPlexGetCone) but it does not
> seem to work for a Submesh since I get the following error:
>
> ----------
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Not for partially interpolated meshes
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021
> [0]PETSC ERROR: Configure options
> --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk
> --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0
> CFLAGS= FFLAGS= CXXFLAGS=
> --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc
> --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++
> --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90
> --with-precision=double --with-scalar-type=real --with-shared-libraries=1
> --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS=
> CXXOPTFLAGS=
> --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so
> --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0
> --with-metis=1
> --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2
> --with-hypre=1
> --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7
> --with-parmetis=1
> --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc
> --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0
> --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0
> --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0
> --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0
> --with-libjpeg=0 --with-cxx-dialect=C++11
> --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include
> --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a
> --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0
> --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include
> --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so
> --with-hdf5=1
> --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include
> --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so
> --with-zlib=1
> [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in
> /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c
> [0]PETSC ERROR: #2 assemble_mass() line 59 in
> /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp
>
> ----------
>
> Attached are a minimal working example, a mesh file and a makefile.
>

The code does not seem to be attached.

As Stefano says, you can use DMPlexGetTransitiveClosure() to get vertices,
but if you actually want values attached to the
vertices, it is easier to use DMPlexVecGetClosure().

  Thanks,

     Matt


> Any ideas or suggestions are more than welcome !
>
> Regards,
> Fabien
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/108e09f7/attachment-0001.html>

From fabien.vergnet at sorbonne-universite.fr  Mon Feb 22 12:06:41 2021
From: fabien.vergnet at sorbonne-universite.fr (Fabien Vergnet)
Date: Mon, 22 Feb 2021 19:06:41 +0100
Subject: [petsc-users] Get the vertices composing the cells of a DMPlex
 Submesh
In-Reply-To: <CAMYG4GnsGv-aXrNtZ317fQpEKdvjd0yHc7FkJFWYntV-HucoQQ@mail.gmail.com>
References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr>
	<CAMYG4GnsGv-aXrNtZ317fQpEKdvjd0yHc7FkJFWYntV-HucoQQ@mail.gmail.com>
Message-ID: <D187EEE5-F130-424F-8C17-44F51334A4D4@sorbonne-universite.fr>


Hi Matthew and Stefano,

@Stefano, thank you for the explanation. DMPlexGetTransitiveClosure is exactly what I needed.

@Matthew, thank you for your advise on using DMPlexFilter, this is much better !

Regards,
Fabien

> Le 22 f?vr. 2021 ? 14:36, Matthew Knepley <knepley at gmail.com> a ?crit :
> 
> On Mon, Feb 22, 2021 at 6:28 AM Fabien Vergnet <fabien.vergnet at sorbonne-universite.fr <mailto:fabien.vergnet at sorbonne-universite.fr>> wrote:
> Dear PETSc community,
> 
> Thank you for your amazing work. I discovered PETSc recently and I need your help for a project.
> 
> As a training, I would like to assemble the Finite Element Matrix for the Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh file and I create a Submesh with
> 
> DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm);
> 
> I don't think you want this call. It is designed to pick out hypersurfaces. If you just want part of a mesh, DMPlexFilter is better.
>  
> In order to assemble my Finite Element Matrix, I need to iterate over the cells of the mesh (which are triangles) and identify the vertices composing each cell.
> 
> My question is the following : how can I get, for each cell, the vertices composing the cell ?
> 
> I have tried to uninterpolate the subdm with DMPlexUninterpolate (with the objective to get the vertices from DMPlexGetCone) but it does not seem to work for a Submesh since I get the following error:
> 
> ----------
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Not for partially interpolated meshes
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 
> [0]PETSC ERROR: Configure options --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS= FFLAGS= CXXFLAGS= --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 --with-metis=1 --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 --with-hypre=1 --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 --with-parmetis=1 --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 --with-libjpeg=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so --with-hdf5=1 --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so --with-zlib=1
> [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c
> [0]PETSC ERROR: #2 assemble_mass() line 59 in /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp
> 
> ----------
> 
> Attached are a minimal working example, a mesh file and a makefile.
> 
> The code does not seem to be attached.
> 
> As Stefano says, you can use DMPlexGetTransitiveClosure() to get vertices, but if you actually want values attached to the
> vertices, it is easier to use DMPlexVecGetClosure().
> 
>   Thanks,
> 
>      Matt
>  
> Any ideas or suggestions are more than welcome !
> 
> Regards,
> Fabien
> 
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/24e12e3f/attachment.html>

From lisandro.verga.bega at gmail.com  Mon Feb 22 13:59:04 2021
From: lisandro.verga.bega at gmail.com (Lisandro Verga)
Date: Mon, 22 Feb 2021 11:59:04 -0800
Subject: [petsc-users] Example finite volume silver in PETSc
Message-ID: <CAB-=f8y7usPYG7MzxrvWxi-W5NSeXFJE=mSJBkMA6L7CMDdBNw@mail.gmail.com>

Dear PETSc Team,

I would like to ask you if there a finite volume solver build using the
PETSc data structure. I have found several manuscripts or presentations
that mention that but I cannot retrieve an example it.

Thank you.

Regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/900f9770/attachment.html>

From jonathan.guyer at nist.gov  Mon Feb 22 14:09:13 2021
From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed))
Date: Mon, 22 Feb 2021 20:09:13 +0000
Subject: [petsc-users] Example finite volume silver in PETSc
In-Reply-To: <CAB-=f8y7usPYG7MzxrvWxi-W5NSeXFJE=mSJBkMA6L7CMDdBNw@mail.gmail.com>
References: <CAB-=f8y7usPYG7MzxrvWxi-W5NSeXFJE=mSJBkMA6L7CMDdBNw@mail.gmail.com>
Message-ID: <A2C5009B-B384-477B-87B0-551288C3D7B4@nist.gov>

[FiPy](https://www.ctcms.nist.gov/fipy) is a finite volume code that can use PETSc as one possible solver suite. See:

https://github.com/usnistgov/fipy/tree/master/fipy/solvers/petsc
https://github.com/usnistgov/fipy/blob/master/fipy/matrices/petscMatrix.py

Caveat: It?s not intended as an instructional example, if that?s what you?re looking for.

On Feb 22, 2021, at 2:59 PM, Lisandro Verga <lisandro.verga.bega at gmail.com> wrote:

Dear PETSc Team,

I would like to ask you if there a finite volume solver build using the PETSc data structure. I have found several manuscripts or presentations that mention that but I cannot retrieve an example it.

Thank you.

Regards,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/d4464161/attachment-0001.html>

From elbueler at alaska.edu  Mon Feb 22 14:25:35 2021
From: elbueler at alaska.edu (Ed Bueler)
Date: Mon, 22 Feb 2021 11:25:35 -0900
Subject: [petsc-users] re Example finite volume silver in PETSc
Message-ID: <CAOHboJ_g61LTEWPGybfk+Ayw8T5wU53NgvE+Km_mwwLS-HHDTw@mail.gmail.com>

A very basic 2D FV example, a scalar advection solver, using PETSc DMDA, is
at
   https://github.com/bueler/p4pdes/blob/master/c/ch11/advect.c
and documented in Chapter 11 of my book (
https://my.siam.org/Store/Product/viewproduct/?ProductId=32850137).  This
example might be most useful to you if you are interested in implementing
flux limiters.

Ed

> Dear PETSc Team,
>
> I would like to ask you if there a finite volume solver build using the
> PETSc data structure. I have found several manuscripts or presentations
> that mention that but I cannot retrieve an example it.
>
> Thank you.
>
> Regards,

-- 
Ed Bueler
Dept of Mathematics and Statistics
University of Alaska Fairbanks
Fairbanks, AK 99775-6660
306C Chapman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210222/d28dc261/attachment.html>

From thibault.bridelbertomeu at gmail.com  Tue Feb 23 01:37:05 2021
From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu)
Date: Tue, 23 Feb 2021 08:37:05 +0100
Subject: [petsc-users] DMPlex read partitionned Gmsh
Message-ID: <CAJ6QA5sKKHNR=qNMSHNxuGUhZSfkUXd_ysrMdLciOGeeFG72xw@mail.gmail.com>

Dear all,

I was wondering if there was a plan in motion to implement yet another
possibility for DMPlexCreateGmshFromFile: read a group of foo_*.msh
generated from a partition done directly in Gmsh ?

Have a great day,

Thibault B.-B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/457a41c9/attachment.html>

From brardafrancesco at gmail.com  Tue Feb 23 02:51:27 2021
From: brardafrancesco at gmail.com (Francesco Brarda)
Date: Tue, 23 Feb 2021 09:51:27 +0100
Subject: [petsc-users] Caught signal number 11 SEGV
Message-ID: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>

Hi!

I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here <https://discourse.mc-stan.org/t/using-petsc-library-together-with-stan/17550> you can find the discussion. 
It has been defined a functor <https://github.com/IvanYashchuk/stan-math-petsc/blob/stan-petsc/stan/math/prim/fun/stan_petsc_interface.hpp> to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. 
The file <https://github.com/IvanYashchuk/stan-math-petsc/blob/stan-petsc/stan/math/rev/functor/petsc_functor.hpp> using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling.
Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following:

[2]PETSC ERROR: ------------------------------------------------------------------------
[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[2]PETSC ERROR: likely location of problem given in stack below
[2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[2]PETSC ERROR:       INSTEAD the line number of the start of the function
[2]PETSC ERROR:       is given.
[3]PETSC ERROR: ------------------------------------------------------------------------
[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[3]PETSC ERROR: likely location of problem given in stack below
[3]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
[3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[3]PETSC ERROR:       INSTEAD the line number of the start of the function
[3]PETSC ERROR:       is given.
[3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
  To prevent termination, change the error handler using PetscPushErrorHandler()
[2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
  To prevent termination, change the error handler using PetscPushErrorHandler()

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 22939 RUNNING AT srvulx13
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? 
I hope I have been clear enough.
Attached you can find also my configure.log and make.log files.

Best,
Francesco


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/a1702e6c/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 2588258 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/a1702e6c/attachment-0002.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/a1702e6c/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log
Type: application/octet-stream
Size: 12090 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/a1702e6c/attachment-0003.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/a1702e6c/attachment-0005.html>

From knepley at gmail.com  Tue Feb 23 04:49:53 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 23 Feb 2021 05:49:53 -0500
Subject: [petsc-users] DMPlex read partitionned Gmsh
In-Reply-To: <CAJ6QA5sKKHNR=qNMSHNxuGUhZSfkUXd_ysrMdLciOGeeFG72xw@mail.gmail.com>
References: <CAJ6QA5sKKHNR=qNMSHNxuGUhZSfkUXd_ysrMdLciOGeeFG72xw@mail.gmail.com>
Message-ID: <CAMYG4GmhWChhx=ubfB_S14JEFt3TN2s-Nwqd2PhamGYY=KZbdA@mail.gmail.com>

On Tue, Feb 23, 2021 at 2:37 AM Thibault Bridel-Bertomeu <
thibault.bridelbertomeu at gmail.com> wrote:

> Dear all,
>
> I was wondering if there was a plan in motion to implement yet another
> possibility for DMPlexCreateGmshFromFile: read a group of foo_*.msh
> generated from a partition done directly in Gmsh ?
>

What we have implemented now is a system that reads a mesh in parallel from
disk into a naive partition, then repartitions and redistributes.
We have a paper about this strategy: https://arxiv.org/abs/2004.08729 .
Right now it is only implemented in HDF5. This is mainly because:

1) Parallel block reads are easy in HDF5.

2) We use it for checkpointing as well as load, and it is flexible enough
for this

3) Label information can be stored in a scalable way

It is easy to convert from GMsh to HDF5 (it's a few lines of PETSc). The
GMsh format is not ideal for parallelism, and in fact the GMsh reader
was also using MED, which is an HDF5 format. We originally wrote an MED
reader, but the documentation and support for the library were
not up to snuff, so we went with a custom HDF5 format.

  Is this helpful?

     Matt


> Have a great day,
>
> Thibault B.-B.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/9f954c42/attachment.html>

From knepley at gmail.com  Tue Feb 23 04:54:31 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 23 Feb 2021 05:54:31 -0500
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
Message-ID: <CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>

On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <brardafrancesco at gmail.com>
wrote:

> Hi!
>
> I am very new to the PETSc world. I am working with a GitHub repo that
> uses PETSc together with Stan (a statistics open source software), here
> <https://discourse.mc-stan.org/t/using-petsc-library-together-with-stan/17550> you
> can find the discussion.
> It has been defined a functor
> <https://github.com/IvanYashchuk/stan-math-petsc/blob/stan-petsc/stan/math/prim/fun/stan_petsc_interface.hpp> to
> convert EigenVector to PetscVec and viceversa, both sequentially and in
> parallel.
> The file
> <https://github.com/IvanYashchuk/stan-math-petsc/blob/stan-petsc/stan/math/rev/functor/petsc_functor.hpp> using
> these functions does the conversions with the sequential setting. I changed
> to those using MPI, that is from EigenVectorToPetscVecSeq
> to EigenVectorToPetscVecMPI and so on because I want to evaluate the
> scaling.
> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock
> optimize in the debug mode I get the error Caught signal number 11 SEGV.
> I therefore used the option -start_in_debugger and I get the following:
>

For some reason, the -start_in_debuggger option is not being seen. Are you
showing all the output? Once the debugger is attached,
you run the program (conr) and then when you hit the SEGV you get a stack
trace (where).

  THanks,

    Matt


> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
> Violation, probably memory access out of range
> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [2]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
> OS X to find memory corruption errors
> [2]PETSC ERROR: likely location of problem given in stack below
> [2]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are
> not available,
> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> [2]PETSC ERROR:       is given.
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
> Violation, probably memory access out of range
> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [3]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
> OS X to find memory corruption errors
> [3]PETSC ERROR: likely location of problem given in stack below
> [3]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are
> not available,
> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
> [3]PETSC ERROR:       is given.
> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line
> 0 in  unknown file (null)
>   To prevent termination, change the error handler
> using PetscPushErrorHandler()
> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line
> 0 in  unknown file (null)
>   To prevent termination, change the error handler
> using PetscPushErrorHandler()
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 22939 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
> I read the documentation regarding the PetscAbortErrorHandler, but I do
> not know where should I use it. How can I solve the problem?
> I hope I have been clear enough.
> Attached you can find also my configure.log and make.log files.
>
> Best,
> Francesco
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/54d20e01/attachment.html>

From anton.glazkov at chch.ox.ac.uk  Tue Feb 23 05:32:27 2021
From: anton.glazkov at chch.ox.ac.uk (Anton Glazkov)
Date: Tue, 23 Feb 2021 11:32:27 +0000
Subject: [petsc-users] 64bit indices and revolve compiler warnings
Message-ID: <LO2P265MB174422125A52EC5D9BB61DA5E9809@LO2P265MB1744.GBRP265.PROD.OUTLOOK.COM>

Good morning,

I have been trying to compile PETSc with 64bit indices and revolve. It compiles ok but gives out warnings of the kind:

{PETSCDIR PATH REMOVED}/src/ts/trajectory/impls/memory/trajmemory.c:1479:130: warning: incompatible pointer types passing 'PetscInt *' (aka 'long long *') to parameter of type 'int *' [-Wincompatible-pointer-types]
    whattodo = revolve_action(&tjsch->rctx->check,&tjsch->rctx->capo,&tjsch->rctx->fine,tjsch->rctx->snaps_in,&tjsch->rctx->info,&tjsch->rctx->where); /* must return 1 or 3 or 4*/
                                                                                                                                 ^~~~~~~~~~~~~~~~~~~
{PETSCDIR PATH REMOVED}/lib/include/revolve_c.h:14:49: note: passing argument to parameter here
int  revolve_action(int*,int*,int*,int,int*,int*);

Is revolve incompatible with 64bit indices by design?

Best wishes,
Anton

PS the compile line is this:
./configure ?prefix={PREFIX REMOVED}  --with-cc=cc --with-cxx=CC --with-fc=ftn --with-debugging=0 --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --COPTFLAGS=-g -O3 --CXXOPTFLAGS=-g -O3 --FOPTFLAGS=-g -O3 --with-64-bit-indices --with-scalar-type=complex --download-hypre-shared --download-moab-shared --download-superlu_dist-shared --download-revolve=1 --with-hdf5-dir=/opt/cray/pe/hdf5-parallel/1.12.0.2/CRAYCLANG/9.1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/c7f19a2d/attachment-0001.html>

From elena.travaglia at edu.unito.it  Tue Feb 23 10:49:01 2021
From: elena.travaglia at edu.unito.it (Elena Travaglia)
Date: Tue, 23 Feb 2021 17:49:01 +0100
Subject: [petsc-users] Preconditioner for LSC
Message-ID: <CA+-JvF53qcMb1CTZGCVtZcZ0HL8GJ0p5O00=2+_WUhh4ccUpGw@mail.gmail.com>

Dear PETSc users,

we would like to compare our preconditioner for the Schur complement
of a Stokes system, with the LSC preconditioner already implemented in
PETSc. Following the example in the PETSc manual, we've tried
   -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type ml
but this is not working (properly) on our problem.

On the other hand we think we have a good preconditioner for A10*A01,
so  we'd like to try
   -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type shell
but we cannot figure out how to attach our apply() routine to
the pc object of fieldsplit_1_lsc.

Can this be done in the current interface?
Or perhaps, should we call KSPGetOperators on the fieldsplit_1 solver
and attach to its Sp operator a "LSC_Lp" of type MATSHELL with our routine
attached to the MATOP_SOLVE of the shell matrix?

Thanks in advance,

Elena and Matteo

-- 
------------------------


Indirizzo istituzionale di posta elettronica 
degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial?
University of Turin?email address?for students and graduates?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/ea2c8360/attachment.html>

From balay at mcs.anl.gov  Tue Feb 23 11:19:25 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 23 Feb 2021 11:19:25 -0600
Subject: [petsc-users] headsup: switch git default branch from 'master' to
 'main'
Message-ID: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov>

All,

This is a heads-up, we are to switch the default branch in petsc git
repo from 'master' to 'main'

[Will plan to do the switch on friday the 26th]

We've previously switched 'maint' branch to 'release' before 3.14
release - and this change (to 'main') is the next step in this direction.

Satish


From dalcinl at gmail.com  Tue Feb 23 13:52:21 2021
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 23 Feb 2021 22:52:21 +0300
Subject: [petsc-users] headsup: switch git default branch from 'master'
 to 'main'
In-Reply-To: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov>
References: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov>
Message-ID: <CAEcYPwDNqC7M6u3_JuGy8dp+R7G0W0EDen_SG8+JodcTJLr2pA@mail.gmail.com>

May the force be with you, Satish.

On Tue, 23 Feb 2021 at 20:19, Satish Balay via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> All,
>
> This is a heads-up, we are to switch the default branch in petsc git
> repo from 'master' to 'main'
>
> [Will plan to do the switch on friday the 26th]
>
> We've previously switched 'maint' branch to 'release' before 3.14
> release - and this change (to 'main') is the next step in this direction.
>
> Satish
>
>

-- 
Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/1ecc4064/attachment.html>

From brardafrancesco at gmail.com  Tue Feb 23 14:55:19 2021
From: brardafrancesco at gmail.com (Francesco Brarda)
Date: Tue, 23 Feb 2021 21:55:19 +0100
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
	<CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
Message-ID: <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>

Thank you for the quick response. 
Sorry, you are right. Here is the complete output:

fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13
PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13
xterm: Xt error: Can't open display: :0.0
xterm: DISPLAY is not set
xterm: Xt error: Can't open display: :0.0
xterm: DISPLAY is not set
method = optimize
  optimize
    algorithm = lbfgs (Default)
      lbfgs
method = optimize
  optimize
    algorithm = lbfgs (Default)
      lbfgs
        init_alpha = 0.001 (Default)
        tol_obj = 9.9999999999999998e-13 (Default)
        tol_rel_obj = 10000 (Default)
        tol_grad = 1e-08 (Default)
        init_alpha = 0.001 (Default)
        tol_obj = 9.9999999999999998e-13 (Default)
        tol_rel_obj = 10000 (Default)
        tol_grad = 1e-08 (Default)
        tol_rel_grad = 10000000 (Default)
        tol_param = 1e-08 (Default)
        history_size = 5 (Default)
        tol_rel_grad = 10000000 (Default)
        tol_param = 1e-08 (Default)
        history_size = 5 (Default)
    iter = 2000 (Default)
    iter = 2000 (Default)
    save_iterations = 0 (Default)
id = 0 (Default)
data    save_iterations = 0 (Default)
id = 0 (Default)
data
  file =  (Default)

  file =  (Default)
init = 2 (Default)
random
  seed = 3585768430 (Default)
init = 2 (Default)
random
  seed = 3585768430 (Default)
output
  file = output.csv (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)


Initial log joint probability = -731.444
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
[1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp  
  To prevent termination, change the error handler using PetscPushErrorHandler()

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 47804 RUNNING AT srvulx13
=   EXIT CODE: 134
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions


The code inside main.cpp is the following:

#include <cmdstan/command.hpp>
#include <stan/services/error_codes.hpp>

#include <petsc.h>

int main(int argc, char* argv[]) {

  PetscErrorCode ierr;
  ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);

  try {
    ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
  } catch (const std::exception& e) {
    std::cout << e.what() << std::endl;
    ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
  }

  ierr = PetscFinalize();CHKERRQ(ierr);
  return ierr;
}

I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command.
Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ?

Best,
Francesco


> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com> ha scritto:
> 
> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <brardafrancesco at gmail.com> wrote:
> Hi!
> 
> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. 
> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. 
> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling.
> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following:
> 
> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached,
> you run the program (conr) and then when you hit the SEGV you get a stack trace (where).
> 
>   THanks,
> 
>     Matt
>  
> [2]PETSC ERROR: ------------------------------------------------------------------------
> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [2]PETSC ERROR: likely location of problem given in stack below
> [2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> [2]PETSC ERROR:       is given.
> [3]PETSC ERROR: ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> [3]PETSC ERROR: likely location of problem given in stack below
> [3]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
> [3]PETSC ERROR:       is given.
> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
>   To prevent termination, change the error handler using PetscPushErrorHandler()
> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
>   To prevent termination, change the error handler using PetscPushErrorHandler()
> 
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 22939 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
> 
> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? 
> I hope I have been clear enough.
> Attached you can find also my configure.log and make.log files.
> 
> Best,
> Francesco
> 
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/dec8dbeb/attachment.html>

From knepley at gmail.com  Tue Feb 23 14:59:59 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 23 Feb 2021 15:59:59 -0500
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
	<CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
	<9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
Message-ID: <CAMYG4GksaXHF=B+a9TP5GgEPn658prq6Y=f-gJQg_phBDBVxAw@mail.gmail.com>

On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <brardafrancesco at gmail.com>
wrote:

> Thank you for the quick response.
> Sorry, you are right. Here is the complete output:
>
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n
> 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803
> on display :0.0 on machine srvulx13
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804
> on display :0.0 on machine srvulx13
> xterm: Xt error: Can't open display: :0.0
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display: :0.0
> xterm: DISPLAY is not set
>

Do you have an Xserver running? If not, you can use

  -start_in_debugger noxterm -debugger_nodes 3

and try to get a stack trace from one node.

  Thanks,

    Matt


> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data    save_iterations = 0 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
>
>   file =  (Default)
> init = 2 (Default)
> random
>   seed = 3585768430 (Default)
> init = 2 (Default)
> random
>   seed = 3585768430 (Default)
> output
>   file = output.csv (Default)
> output
>   file = output.csv (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>
>
> Initial log joint probability = -731.444
>     Iter      log prob        ||dx||      ||grad||       alpha
>   alpha0  # evals  Notes
> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12
> in src/cmdstan/main.cpp
>   To prevent termination, change the error handler
> using PetscPushErrorHandler()
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 47804 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
>
>
>
>
> The code inside main.cpp is the following:
>
> #include <cmdstan/command.hpp>
> #include <stan/services/error_codes.hpp>
>
> #include <petsc.h>
>
> int main(int argc, char* argv[]) {
>
>   PetscErrorCode ierr;
>   ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
>
>   try {
>     ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
>   } catch (const std::exception& e) {
>     std::cout << e.what() << std::endl;
>     ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
>   }
>
>   ierr = PetscFinalize();CHKERRQ(ierr);
>   return ierr;
> }
>
> I highlighted the line 12. Although I read the page where the command
> PetscPushErrorHandler is explained and the example provided
> (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should
> effectively use the command.
> Should I change the entire try/catch with PetscPushErrorHandler(
> PetscIgnoreErrorHandler,NULL); ?
>
> Best,
> Francesco
>
>
> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com>
> ha scritto:
>
> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <
> brardafrancesco at gmail.com> wrote:
> Hi!
>
> I am very new to the PETSc world. I am working with a GitHub repo that
> uses PETSc together with Stan (a statistics open source software), here you
> can find the discussion.
> It has been defined a functor to convert EigenVector to PetscVec and
> viceversa, both sequentially and in parallel.
> The file using these functions does the conversions with the sequential
> setting. I changed to those using MPI, that is
> from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because
> I want to evaluate the scaling.
> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock
> optimize in the debug mode I get the error Caught signal number 11 SEGV. I
> therefore used the option -start_in_debugger and I get the following:
>
> For some reason, the -start_in_debuggger option is not being seen. Are you
> showing all the output? Once the debugger is attached,
> you run the program (conr) and then when you hit the SEGV you get a stack
> trace (where).
>
>   THanks,
>
>     Matt
>
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [2]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [2]PETSC ERROR: likely location of problem given in stack below
> [2]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> [2]PETSC ERROR:       is given.
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [3]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [3]PETSC ERROR: likely location of problem given in stack below
> [3]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
> [3]PETSC ERROR:       is given.
> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in
>  unknown file (null)
>   To prevent termination, change the error handler using
> PetscPushErrorHandler()
> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in
>  unknown file (null)
>   To prevent termination, change the error handler using
> PetscPushErrorHandler()
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 22939 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
> I read the documentation regarding the PetscAbortErrorHandler, but I do
> not know where should I use it. How can I solve the problem?
> I hope I have been clear enough.
> Attached you can find also my configure.log and make.log files.
>
> Best,
> Francesco
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/21529869/attachment-0001.html>

From brardafrancesco at gmail.com  Tue Feb 23 15:46:49 2021
From: brardafrancesco at gmail.com (Francesco Brarda)
Date: Tue, 23 Feb 2021 22:46:49 +0100
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <CAMYG4GksaXHF=B+a9TP5GgEPn658prq6Y=f-gJQg_phBDBVxAw@mail.gmail.com>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
	<CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
	<9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
	<CAMYG4GksaXHF=B+a9TP5GgEPn658prq6Y=f-gJQg_phBDBVxAw@mail.gmail.com>
Message-ID: <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com>

Using the command you suggested I got 

fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3
** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings)
method = optimize
  optimize
    algorithm = lbfgs (Default)
      lbfgs
method = optimize
  optimize
    algorithm = lbfgs (Default)
      lbfgs
        init_alpha = 0.001 (Default)
        tol_obj = 9.9999999999999998e-13 (Default)
        init_alpha = 0.001 (Default)
        tol_obj = 9.9999999999999998e-13 (Default)
        tol_rel_obj = 10000 (Default)
        tol_grad = 1e-08 (Default)        tol_rel_obj = 10000 (Default)
        tol_grad = 1e-08 (Default)
        tol_rel_grad = 10000000 (Default)

        tol_rel_grad = 10000000 (Default)
        tol_param = 1e-08 (Default)        tol_param = 1e-08 (Default)
        history_size = 5 (Default)
    iter = 2000 (Default)

        history_size = 5 (Default)
    iter = 2000 (Default)
    save_iterations = 0 (Default)
id = 0 (Default)
data
    save_iterations = 0 (Default)
id = 0 (Default)
data
  file =  (Default)
  file =  (Default)
init = 2 (Default)
random
  seed = 3623621468 (Default)
output
  file = output.csv (Default)init = 2 (Default)
random
  seed = 3623621468 (Default)
output
  file = output.csv (Default)

  diagnostic_file =  (Default)
  refresh = 100 (Default)

  diagnostic_file =  (Default)
  refresh = 100 (Default)

Initial log joint probability = -195.984
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      10      -0.97101    0.00292919       1.65855       0.001       0.001       46  LS failed, Hessian reset 
      12     -0.483952      0.001316       1.18542       0.001       0.001       77  LS failed, Hessian reset 
      13     -0.477916     0.0118542      0.163518        0.01       0.001      106  LS failed, Hessian reset 
[1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp
[1]PETSC ERROR: PETSc Option Table entries:
[1]PETSC ERROR: -debugger_nodes 3
[1]PETSC ERROR: -start_in_debugger noxterm
[1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov?????
 
And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please?
I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean.

Thank you,
Francesco


> Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <knepley at gmail.com> ha scritto:
> 
> On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <brardafrancesco at gmail.com> wrote:
> Thank you for the quick response. 
> Sorry, you are right. Here is the complete output:
> 
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13
> xterm: Xt error: Can't open display: :0.0
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display: :0.0
> xterm: DISPLAY is not set
> 
> Do you have an Xserver running? If not, you can use
> 
>   -start_in_debugger noxterm -debugger_nodes 3
> 
> and try to get a stack trace from one node.
> 
>   Thanks,
> 
>     Matt
>  
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data    save_iterations = 0 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
> 
>   file =  (Default)
> init = 2 (Default)
> random
>   seed = 3585768430 (Default)
> init = 2 (Default)
> random
>   seed = 3585768430 (Default)
> output
>   file = output.csv (Default)
> output
>   file = output.csv (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
> 
> 
> Initial log joint probability = -731.444
>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp  
>   To prevent termination, change the error handler using PetscPushErrorHandler()
> 
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 47804 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
> 
> 
> 
> 
> 
> The code inside main.cpp is the following:
> 
> #include <cmdstan/command.hpp>
> #include <stan/services/error_codes.hpp>
> 
> #include <petsc.h>
> 
> int main(int argc, char* argv[]) {
> 
>   PetscErrorCode ierr;
>   ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
> 
>   try {
>     ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
>   } catch (const std::exception& e) {
>     std::cout << e.what() << std::endl;
>     ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
>   }
> 
>   ierr = PetscFinalize();CHKERRQ(ierr);
>   return ierr;
> }
> 
> I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command.
> Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ?
> 
> Best,
> Francesco
> 
> 
>> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com> ha scritto:
>> 
>> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <brardafrancesco at gmail.com> wrote:
>> Hi!
>> 
>> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. 
>> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. 
>> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling.
>> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following:
>> 
>> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached,
>> you run the program (conr) and then when you hit the SEGV you get a stack trace (where).
>> 
>>   THanks,
>> 
>>     Matt
>>  
>> [2]PETSC ERROR: ------------------------------------------------------------------------
>> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [2]PETSC ERROR: likely location of problem given in stack below
>> [2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [2]PETSC ERROR:       is given.
>> [3]PETSC ERROR: ------------------------------------------------------------------------
>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>> [3]PETSC ERROR: likely location of problem given in stack below
>> [3]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
>> [3]PETSC ERROR:       is given.
>> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
>>   To prevent termination, change the error handler using PetscPushErrorHandler()
>> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
>>   To prevent termination, change the error handler using PetscPushErrorHandler()
>> 
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 22939 RUNNING AT srvulx13
>> =   EXIT CODE: 134
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>> This typically refers to a problem with your application.
>> Please see the FAQ page for debugging suggestions
>> 
>> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? 
>> I hope I have been clear enough.
>> Attached you can find also my configure.log and make.log files.
>> 
>> Best,
>> Francesco
>> 
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/f4566dfa/attachment.html>

From balay at mcs.anl.gov  Tue Feb 23 15:49:23 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 23 Feb 2021 15:49:23 -0600
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
	<CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
	<9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
	<CAMYG4GksaXHF=B+a9TP5GgEPn658prq6Y=f-gJQg_phBDBVxAw@mail.gmail.com>
	<5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com>
Message-ID: <cfc4e417-8c6-176-a91e-15e58ab8cf50@mcs.anl.gov>

This run is with '-n 2' - so -debugger_nodes value should be either 0 or 1

Satish

On Tue, 23 Feb 2021, Francesco Brarda wrote:

> Using the command you suggested I got 
> 
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3
> ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings)
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)        tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
> 
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)        tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>     iter = 2000 (Default)
> 
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
>   file =  (Default)
> init = 2 (Default)
> random
>   seed = 3623621468 (Default)
> output
>   file = output.csv (Default)init = 2 (Default)
> random
>   seed = 3623621468 (Default)
> output
>   file = output.csv (Default)
> 
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
> 
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
> 
> Initial log joint probability = -195.984
>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
>       10      -0.97101    0.00292919       1.65855       0.001       0.001       46  LS failed, Hessian reset 
>       12     -0.483952      0.001316       1.18542       0.001       0.001       77  LS failed, Hessian reset 
>       13     -0.477916     0.0118542      0.163518        0.01       0.001      106  LS failed, Hessian reset 
> [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp
> [1]PETSC ERROR: PETSc Option Table entries:
> [1]PETSC ERROR: -debugger_nodes 3
> [1]PETSC ERROR: -start_in_debugger noxterm
> [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov?????
>  
> And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please?
> I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean.
> 
> Thank you,
> Francesco
> 
> 
> > Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <knepley at gmail.com> ha scritto:
> > 
> > On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <brardafrancesco at gmail.com> wrote:
> > Thank you for the quick response. 
> > Sorry, you are right. Here is the complete output:
> > 
> > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
> > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13
> > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13
> > xterm: Xt error: Can't open display: :0.0
> > xterm: DISPLAY is not set
> > xterm: Xt error: Can't open display: :0.0
> > xterm: DISPLAY is not set
> > 
> > Do you have an Xserver running? If not, you can use
> > 
> >   -start_in_debugger noxterm -debugger_nodes 3
> > 
> > and try to get a stack trace from one node.
> > 
> >   Thanks,
> > 
> >     Matt
> >  
> > method = optimize
> >   optimize
> >     algorithm = lbfgs (Default)
> >       lbfgs
> > method = optimize
> >   optimize
> >     algorithm = lbfgs (Default)
> >       lbfgs
> >         init_alpha = 0.001 (Default)
> >         tol_obj = 9.9999999999999998e-13 (Default)
> >         tol_rel_obj = 10000 (Default)
> >         tol_grad = 1e-08 (Default)
> >         init_alpha = 0.001 (Default)
> >         tol_obj = 9.9999999999999998e-13 (Default)
> >         tol_rel_obj = 10000 (Default)
> >         tol_grad = 1e-08 (Default)
> >         tol_rel_grad = 10000000 (Default)
> >         tol_param = 1e-08 (Default)
> >         history_size = 5 (Default)
> >         tol_rel_grad = 10000000 (Default)
> >         tol_param = 1e-08 (Default)
> >         history_size = 5 (Default)
> >     iter = 2000 (Default)
> >     iter = 2000 (Default)
> >     save_iterations = 0 (Default)
> > id = 0 (Default)
> > data    save_iterations = 0 (Default)
> > id = 0 (Default)
> > data
> >   file =  (Default)
> > 
> >   file =  (Default)
> > init = 2 (Default)
> > random
> >   seed = 3585768430 (Default)
> > init = 2 (Default)
> > random
> >   seed = 3585768430 (Default)
> > output
> >   file = output.csv (Default)
> > output
> >   file = output.csv (Default)
> >   diagnostic_file =  (Default)
> >   refresh = 100 (Default)
> >   diagnostic_file =  (Default)
> >   refresh = 100 (Default)
> > 
> > 
> > Initial log joint probability = -731.444
> >     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
> > [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp  
> >   To prevent termination, change the error handler using PetscPushErrorHandler()
> > 
> > ===================================================================================
> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > =   PID 47804 RUNNING AT srvulx13
> > =   EXIT CODE: 134
> > =   CLEANING UP REMAINING PROCESSES
> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > ===================================================================================
> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> > This typically refers to a problem with your application.
> > Please see the FAQ page for debugging suggestions
> > 
> > 
> > 
> > 
> > 
> > The code inside main.cpp is the following:
> > 
> > #include <cmdstan/command.hpp>
> > #include <stan/services/error_codes.hpp>
> > 
> > #include <petsc.h>
> > 
> > int main(int argc, char* argv[]) {
> > 
> >   PetscErrorCode ierr;
> >   ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
> > 
> >   try {
> >     ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
> >   } catch (const std::exception& e) {
> >     std::cout << e.what() << std::endl;
> >     ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
> >   }
> > 
> >   ierr = PetscFinalize();CHKERRQ(ierr);
> >   return ierr;
> > }
> > 
> > I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command.
> > Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ?
> > 
> > Best,
> > Francesco
> > 
> > 
> >> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com> ha scritto:
> >> 
> >> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <brardafrancesco at gmail.com> wrote:
> >> Hi!
> >> 
> >> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. 
> >> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. 
> >> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling.
> >> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following:
> >> 
> >> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached,
> >> you run the program (conr) and then when you hit the SEGV you get a stack trace (where).
> >> 
> >>   THanks,
> >> 
> >>     Matt
> >>  
> >> [2]PETSC ERROR: ------------------------------------------------------------------------
> >> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> >> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> >> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> >> [2]PETSC ERROR: likely location of problem given in stack below
> >> [2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> >> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> >> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> >> [2]PETSC ERROR:       is given.
> >> [3]PETSC ERROR: ------------------------------------------------------------------------
> >> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> >> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> >> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> >> [3]PETSC ERROR: likely location of problem given in stack below
> >> [3]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> >> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> >> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
> >> [3]PETSC ERROR:       is given.
> >> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
> >>   To prevent termination, change the error handler using PetscPushErrorHandler()
> >> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
> >>   To prevent termination, change the error handler using PetscPushErrorHandler()
> >> 
> >> ===================================================================================
> >> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> =   PID 22939 RUNNING AT srvulx13
> >> =   EXIT CODE: 134
> >> =   CLEANING UP REMAINING PROCESSES
> >> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> ===================================================================================
> >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> >> This typically refers to a problem with your application.
> >> Please see the FAQ page for debugging suggestions
> >> 
> >> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? 
> >> I hope I have been clear enough.
> >> Attached you can find also my configure.log and make.log files.
> >> 
> >> Best,
> >> Francesco
> >> 
> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> >> -- Norbert Wiener
> >> 
> >> https://www.cse.buffalo.edu/~knepley/
> > 
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 
> 

From junchao.zhang at gmail.com  Tue Feb 23 20:31:46 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Tue, 23 Feb 2021 20:31:46 -0600
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
	<CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
	<9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
Message-ID: <CA+MQGp-fwoyuHK451ZtMLgkAiW3-4MtHfWoaDdBheQ8WUAs8zQ@mail.gmail.com>

Francesco,
   If you want to debug the code, why don't you build petsc and your code
in debug mode so that you can see the full error stack?
   My favorite debugger is ddt. If not available, then I will try
https://github.com/Azrael3000/tmpi. No need to set DISPLAY.  But the
constraint is you have to use OpenMPI.

--Junchao Zhang


On Tue, Feb 23, 2021 at 2:55 PM Francesco Brarda <brardafrancesco at gmail.com>
wrote:

> Thank you for the quick response.
> Sorry, you are right. Here is the complete output:
>
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n
> 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803
> on display :0.0 on machine srvulx13
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804
> on display :0.0 on machine srvulx13
> xterm: Xt error: Can't open display: :0.0
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display: :0.0
> xterm: DISPLAY is not set
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data    save_iterations = 0 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
>
>   file =  (Default)
> init = 2 (Default)
> random
>   seed = 3585768430 (Default)
> init = 2 (Default)
> random
>   seed = 3585768430 (Default)
> output
>   file = output.csv (Default)
> output
>   file = output.csv (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>
>
> Initial log joint probability = -731.444
>     Iter      log prob        ||dx||      ||grad||       alpha
>   alpha0  # evals  Notes
> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12
> in src/cmdstan/main.cpp
>   To prevent termination, change the error handler
> using PetscPushErrorHandler()
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 47804 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
>
>
>
>
> The code inside main.cpp is the following:
>
> #include <cmdstan/command.hpp>
> #include <stan/services/error_codes.hpp>
>
> #include <petsc.h>
>
> int main(int argc, char* argv[]) {
>
>   PetscErrorCode ierr;
>   ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
>
>   try {
>     ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
>   } catch (const std::exception& e) {
>     std::cout << e.what() << std::endl;
>     ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
>   }
>
>   ierr = PetscFinalize();CHKERRQ(ierr);
>   return ierr;
> }
>
> I highlighted the line 12. Although I read the page where the command
> PetscPushErrorHandler is explained and the example provided
> (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should
> effectively use the command.
> Should I change the entire try/catch with PetscPushErrorHandler(
> PetscIgnoreErrorHandler,NULL); ?
>
> Best,
> Francesco
>
>
> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com>
> ha scritto:
>
> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <
> brardafrancesco at gmail.com> wrote:
> Hi!
>
> I am very new to the PETSc world. I am working with a GitHub repo that
> uses PETSc together with Stan (a statistics open source software), here you
> can find the discussion.
> It has been defined a functor to convert EigenVector to PetscVec and
> viceversa, both sequentially and in parallel.
> The file using these functions does the conversions with the sequential
> setting. I changed to those using MPI, that is
> from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because
> I want to evaluate the scaling.
> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock
> optimize in the debug mode I get the error Caught signal number 11 SEGV. I
> therefore used the option -start_in_debugger and I get the following:
>
> For some reason, the -start_in_debuggger option is not being seen. Are you
> showing all the output? Once the debugger is attached,
> you run the program (conr) and then when you hit the SEGV you get a stack
> trace (where).
>
>   THanks,
>
>     Matt
>
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [2]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [2]PETSC ERROR: likely location of problem given in stack below
> [2]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> [2]PETSC ERROR:       is given.
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [3]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [3]PETSC ERROR: likely location of problem given in stack below
> [3]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
> [3]PETSC ERROR:       is given.
> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in
>  unknown file (null)
>   To prevent termination, change the error handler using
> PetscPushErrorHandler()
> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in
>  unknown file (null)
>   To prevent termination, change the error handler using
> PetscPushErrorHandler()
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 22939 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
> I read the documentation regarding the PetscAbortErrorHandler, but I do
> not know where should I use it. How can I solve the problem?
> I hope I have been clear enough.
> Attached you can find also my configure.log and make.log files.
>
> Best,
> Francesco
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/0d1d4bc4/attachment.html>

From junchao.zhang at gmail.com  Tue Feb 23 21:45:29 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Tue, 23 Feb 2021 21:45:29 -0600
Subject: [petsc-users] 64bit indices and revolve compiler warnings
In-Reply-To: <LO2P265MB174422125A52EC5D9BB61DA5E9809@LO2P265MB1744.GBRP265.PROD.OUTLOOK.COM>
References: <LO2P265MB174422125A52EC5D9BB61DA5E9809@LO2P265MB1744.GBRP265.PROD.OUTLOOK.COM>
Message-ID: <CA+MQGp_fYPqLWgyJ+Z1abOtpH=LVmpPBC5jer7=V3KDW5GPRZQ@mail.gmail.com>

SInce the parameter is an integer pointer, we must convert 64-bit indices
to int.  Cc Mr. Hong, who wrote that line.
--Junchao Zhang


On Tue, Feb 23, 2021 at 5:36 AM Anton Glazkov <anton.glazkov at chch.ox.ac.uk>
wrote:

> Good morning,
>
>
>
> I have been trying to compile PETSc with 64bit indices and revolve. It
> compiles ok but gives out warnings of the kind:
>
>
>
> {PETSCDIR PATH REMOVED}/src/ts/trajectory/impls/memory/trajmemory.c:1479:130:
> warning: incompatible pointer types passing 'PetscInt *' (aka 'long long
> *') to parameter of type 'int *' [-Wincompatible-pointer-types]
>
>     whattodo =
> revolve_action(&tjsch->rctx->check,&tjsch->rctx->capo,&tjsch->rctx->fine,tjsch->rctx->snaps_in,&tjsch->rctx->info,&tjsch->rctx->where);
> /* must return 1 or 3 or 4*/
>
>
>                                                      ^~~~~~~~~~~~~~~~~~~
>
> {PETSCDIR PATH REMOVED}/lib/include/revolve_c.h:14:49: note: passing
> argument to parameter here
>
> int  revolve_action(int*,int*,int*,int,int*,int*);
>
>
>
> Is revolve incompatible with 64bit indices by design?
>
>
>
> Best wishes,
>
> Anton
>
>
>
> PS the compile line is this:
>
> ./configure ?prefix={PREFIX REMOVED}  --with-cc=cc --with-cxx=CC
> --with-fc=ftn --with-debugging=0 --with-clib-autodetect=0
> --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --COPTFLAGS=-g
> -O3 --CXXOPTFLAGS=-g -O3 --FOPTFLAGS=-g -O3 --with-64-bit-indices
> --with-scalar-type=complex --download-hypre-shared --download-moab-shared
> --download-superlu_dist-shared --download-revolve=1
> --with-hdf5-dir=/opt/cray/pe/hdf5-parallel/1.12.0.2/CRAYCLANG/9.1
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210223/acb6d0a0/attachment.html>

From jed at jedbrown.org  Tue Feb 23 23:03:54 2021
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 23 Feb 2021 22:03:54 -0700
Subject: [petsc-users] Preconditioner for LSC
In-Reply-To: <CA+-JvF53qcMb1CTZGCVtZcZ0HL8GJ0p5O00=2+_WUhh4ccUpGw@mail.gmail.com>
References: <CA+-JvF53qcMb1CTZGCVtZcZ0HL8GJ0p5O00=2+_WUhh4ccUpGw@mail.gmail.com>
Message-ID: <87sg5m9hvp.fsf@jedbrown.org>

If you've already attached a MatShell, you could presumably use -fieldsplit_1_lsc_pc_type mat to just call its MatMult.

The sense I've gotten when I wrote PCLSC and was experimenting with these methods is that the main selling point of LSC (for most discretizations) is that it's more algebraic than the cheaper PCD methods.

Elena Travaglia <elena.travaglia at edu.unito.it> writes:

> Dear PETSc users,
>
> we would like to compare our preconditioner for the Schur complement
> of a Stokes system, with the LSC preconditioner already implemented in
> PETSc. Following the example in the PETSc manual, we've tried
>    -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type ml
> but this is not working (properly) on our problem.
>
> On the other hand we think we have a good preconditioner for A10*A01,
> so  we'd like to try
>    -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type shell
> but we cannot figure out how to attach our apply() routine to
> the pc object of fieldsplit_1_lsc.
>
> Can this be done in the current interface?
> Or perhaps, should we call KSPGetOperators on the fieldsplit_1 solver
> and attach to its Sp operator a "LSC_Lp" of type MATSHELL with our routine
> attached to the MATOP_SOLVE of the shell matrix?
>
> Thanks in advance,
>
> Elena and Matteo
>
> -- 
> ------------------------
>
>
>
> Indirizzo istituzionale di posta elettronica 
> degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial?
> University of Turin?email address?for students and graduates?

From hongzhang at anl.gov  Tue Feb 23 23:12:27 2021
From: hongzhang at anl.gov (Zhang, Hong)
Date: Wed, 24 Feb 2021 05:12:27 +0000
Subject: [petsc-users] 64bit indices and revolve compiler warnings
In-Reply-To: <LO2P265MB174422125A52EC5D9BB61DA5E9809@LO2P265MB1744.GBRP265.PROD.OUTLOOK.COM>
References: <LO2P265MB174422125A52EC5D9BB61DA5E9809@LO2P265MB1744.GBRP265.PROD.OUTLOOK.COM>
Message-ID: <6A7A2FA3-755D-463B-8538-F74EAC828C1A@anl.gov>


On Feb 23, 2021, at 5:32 AM, Anton Glazkov <anton.glazkov at chch.ox.ac.uk<mailto:anton.glazkov at chch.ox.ac.uk>> wrote:

Good morning,

I have been trying to compile PETSc with 64bit indices and revolve. It compiles ok but gives out warnings of the kind:

{PETSCDIR PATH REMOVED}/src/ts/trajectory/impls/memory/trajmemory.c:1479:130: warning: incompatible pointer types passing 'PetscInt *' (aka 'long long *') to parameter of type 'int *' [-Wincompatible-pointer-types]
    whattodo = revolve_action(&tjsch->rctx->check,&tjsch->rctx->capo,&tjsch->rctx->fine,tjsch->rctx->snaps_in,&tjsch->rctx->info,&tjsch->rctx->where); /* must return 1 or 3 or 4*/
                                                                                                                                 ^~~~~~~~~~~~~~~~~~~
{PETSCDIR PATH REMOVED}/lib/include/revolve_c.h:14:49: note: passing argument to parameter here
int  revolve_action(int*,int*,int*,int,int*,int*);

Is revolve incompatible with 64bit indices by design?

Yes, Revolve uses int32 only. But we can fix the warnings by downcasting. You can check out this MR:

https://gitlab.com/petsc/petsc/-/merge_requests/3654

Thanks,
Hong (Mr.)


Best wishes,
Anton

PS the compile line is this:
./configure ?prefix={PREFIX REMOVED}  --with-cc=cc --with-cxx=CC --with-fc=ftn --with-debugging=0 --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --COPTFLAGS=-g -O3 --CXXOPTFLAGS=-g -O3 --FOPTFLAGS=-g -O3 --with-64-bit-indices --with-scalar-type=complex --download-hypre-shared --download-moab-shared --download-superlu_dist-shared --download-revolve=1 --with-hdf5-dir=/opt/cray/pe/hdf5-parallel/1.12.0.2/CRAYCLANG/9.1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210224/ade97e84/attachment.html>

From bsmith at petsc.dev  Wed Feb 24 00:14:04 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Wed, 24 Feb 2021 00:14:04 -0600
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
	<CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
	<9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
	<CAMYG4GksaXHF=B+a9TP5GgEPn658prq6Y=f-gJQg_phBDBVxAw@mail.gmail.com>
	<5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com>
Message-ID: <00FCB539-D690-4311-8821-DF81DA408FE7@petsc.dev>


start_in_debugger noxterm -debugger_nodes 3

Use -start_in_debugger noxterm -debugger_nodes 0

when not opening windows for each debugger it is best to have the first rank associated with the tty as the debugger node


> On Feb 23, 2021, at 3:46 PM, Francesco Brarda <brardafrancesco at gmail.com> wrote:
> 
> Using the command you suggested I got 
> 
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3
> ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings)
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)        tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
> 
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)        tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>     iter = 2000 (Default)
> 
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
>   file =  (Default)
> init = 2 (Default)
> random
>   seed = 3623621468 (Default)
> output
>   file = output.csv (Default)init = 2 (Default)
> random
>   seed = 3623621468 (Default)
> output
>   file = output.csv (Default)
> 
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
> 
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
> 
> Initial log joint probability = -195.984
>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
>       10      -0.97101    0.00292919       1.65855       0.001       0.001       46  LS failed, Hessian reset 
>       12     -0.483952      0.001316       1.18542       0.001       0.001       77  LS failed, Hessian reset 
>       13     -0.477916     0.0118542      0.163518        0.01       0.001      106  LS failed, Hessian reset 
> [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp
> [1]PETSC ERROR: PETSc Option Table entries:
> [1]PETSC ERROR: -debugger_nodes 3
> [1]PETSC ERROR: -start_in_debugger noxterm
> [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov>?????
>  
> And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please?
> I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean.
> 
> Thank you,
> Francesco
> 
> 
>> Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> ha scritto:
>> 
>> On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <brardafrancesco at gmail.com <mailto:brardafrancesco at gmail.com>> wrote:
>> Thank you for the quick response. 
>> Sorry, you are right. Here is the complete output:
>> 
>> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
>> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13
>> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13
>> xterm: Xt error: Can't open display: :0.0
>> xterm: DISPLAY is not set
>> xterm: Xt error: Can't open display: :0.0
>> xterm: DISPLAY is not set
>> 
>> Do you have an Xserver running? If not, you can use
>> 
>>   -start_in_debugger noxterm -debugger_nodes 3
>> 
>> and try to get a stack trace from one node.
>> 
>>   Thanks,
>> 
>>     Matt
>>  
>> method = optimize
>>   optimize
>>     algorithm = lbfgs (Default)
>>       lbfgs
>> method = optimize
>>   optimize
>>     algorithm = lbfgs (Default)
>>       lbfgs
>>         init_alpha = 0.001 (Default)
>>         tol_obj = 9.9999999999999998e-13 (Default)
>>         tol_rel_obj = 10000 (Default)
>>         tol_grad = 1e-08 (Default)
>>         init_alpha = 0.001 (Default)
>>         tol_obj = 9.9999999999999998e-13 (Default)
>>         tol_rel_obj = 10000 (Default)
>>         tol_grad = 1e-08 (Default)
>>         tol_rel_grad = 10000000 (Default)
>>         tol_param = 1e-08 (Default)
>>         history_size = 5 (Default)
>>         tol_rel_grad = 10000000 (Default)
>>         tol_param = 1e-08 (Default)
>>         history_size = 5 (Default)
>>     iter = 2000 (Default)
>>     iter = 2000 (Default)
>>     save_iterations = 0 (Default)
>> id = 0 (Default)
>> data    save_iterations = 0 (Default)
>> id = 0 (Default)
>> data
>>   file =  (Default)
>> 
>>   file =  (Default)
>> init = 2 (Default)
>> random
>>   seed = 3585768430 (Default)
>> init = 2 (Default)
>> random
>>   seed = 3585768430 (Default)
>> output
>>   file = output.csv (Default)
>> output
>>   file = output.csv (Default)
>>   diagnostic_file =  (Default)
>>   refresh = 100 (Default)
>>   diagnostic_file =  (Default)
>>   refresh = 100 (Default)
>> 
>> 
>> Initial log joint probability = -731.444
>>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
>> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp  
>>   To prevent termination, change the error handler using PetscPushErrorHandler()
>> 
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 47804 RUNNING AT srvulx13
>> =   EXIT CODE: 134
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>> This typically refers to a problem with your application.
>> Please see the FAQ page for debugging suggestions
>> 
>> 
>> 
>> 
>> 
>> The code inside main.cpp is the following:
>> 
>> #include <cmdstan/command.hpp>
>> #include <stan/services/error_codes.hpp>
>> 
>> #include <petsc.h>
>> 
>> int main(int argc, char* argv[]) {
>> 
>>   PetscErrorCode ierr;
>>   ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
>> 
>>   try {
>>     ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
>>   } catch (const std::exception& e) {
>>     std::cout << e.what() << std::endl;
>>     ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
>>   }
>> 
>>   ierr = PetscFinalize();CHKERRQ(ierr);
>>   return ierr;
>> }
>> 
>> I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command.
>> Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ?
>> 
>> Best,
>> Francesco
>> 
>> 
>>> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com> ha scritto:
>>> 
>>> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <brardafrancesco at gmail.com> wrote:
>>> Hi!
>>> 
>>> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. 
>>> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. 
>>> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling.
>>> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following:
>>> 
>>> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached,
>>> you run the program (conr) and then when you hit the SEGV you get a stack trace (where).
>>> 
>>>   THanks,
>>> 
>>>     Matt
>>>  
>>> [2]PETSC ERROR: ------------------------------------------------------------------------
>>> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> [2]PETSC ERROR: likely location of problem given in stack below
>>> [2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
>>> [2]PETSC ERROR:       is given.
>>> [3]PETSC ERROR: ------------------------------------------------------------------------
>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> [3]PETSC ERROR: likely location of problem given in stack below
>>> [3]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>>> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
>>> [3]PETSC ERROR:       is given.
>>> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
>>>   To prevent termination, change the error handler using PetscPushErrorHandler()
>>> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
>>>   To prevent termination, change the error handler using PetscPushErrorHandler()
>>> 
>>> ===================================================================================
>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> =   PID 22939 RUNNING AT srvulx13
>>> =   EXIT CODE: 134
>>> =   CLEANING UP REMAINING PROCESSES
>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>>> This typically refers to a problem with your application.
>>> Please see the FAQ page for debugging suggestions
>>> 
>>> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? 
>>> I hope I have been clear enough.
>>> Attached you can find also my configure.log and make.log files.
>>> 
>>> Best,
>>> Francesco
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210224/25e40de2/attachment-0001.html>

From brardafrancesco at gmail.com  Wed Feb 24 03:39:43 2021
From: brardafrancesco at gmail.com (Francesco Brarda)
Date: Wed, 24 Feb 2021 10:39:43 +0100
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <00FCB539-D690-4311-8821-DF81DA408FE7@petsc.dev>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
	<CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
	<9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
	<CAMYG4GksaXHF=B+a9TP5GgEPn658prq6Y=f-gJQg_phBDBVxAw@mail.gmail.com>
	<5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com>
	<00FCB539-D690-4311-8821-DF81DA408FE7@petsc.dev>
Message-ID: <FE530C11-D0A8-42CD-ACD3-C9A979EF81C7@gmail.com>

I have never used gdb. 
Using 0 as you suggested I got this output:

$PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 0
** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings)
PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 3903 on srvulx13
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from examples/rosenbrock/rosenbrock...done.
Attaching to program: /home/fbrarda/cmdstan-petsc/examples/rosenbrock/rosenbrock, process 3903
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
/home/fbrarda/cmdstan-petsc/3903: No such file or directory.
(gdb) method = optimize
  optimize
    algorithm = lbfgs (Default)
      lbfgs
method = optimize
  optimize
    algorithm = lbfgs (Default)
      lbfgs
        init_alpha = 0.001 (Default)        init_alpha = 0.001 (Default)
        tol_obj = 9.9999999999999998e-13 (Default)
        tol_rel_obj = 10000 (Default)
        tol_grad = 1e-08 (Default)
        tol_rel_grad = 10000000 (Default)
        tol_obj = 9.9999999999999998e-13 (Default)
        tol_rel_obj = 10000 (Default)

        tol_param = 1e-08 (Default)
        history_size = 5 (Default)
        tol_grad = 1e-08 (Default)
    iter = 2000 (Default)
    save_iterations = 0 (Default)        tol_rel_grad = 10000000 (Default)
        tol_param = 1e-08 (Default)
        history_size = 5 (Default)
id = 0 (Default)
data
  file =  (Default)
init = 2 (Default)

    iter = 2000 (Default)
    save_iterations = 0 (Default)
random
  seed = 3666155654 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)id = 0 (Default)
data
  file =  (Default)
init = 2 (Default)

  refresh = 100 (Default)

random
  seed = 3666155654 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)

Initial log joint probability = -158.559
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
      12     -0.253535   0.000284499     0.0383658       0.001       0.001       46  LS failed, Hessian reset 
      13     -0.253535   0.000284499     0.0383658       6.528       0.001      111  LS failed, Hessian reset 
Optimization terminated with error: 
  Line search failed to achieve a sufficient decrease, no more progress can be made
[0]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp  
  To prevent termination, change the error handler using PetscPushErrorHandler()

Using only 1 process the code works. 

Francesco

> Il giorno 24 feb 2021, alle ore 07:14, Barry Smith <bsmith at petsc.dev> ha scritto:
> 
> 
> start_in_debugger noxterm -debugger_nodes 3
> 
> Use -start_in_debugger noxterm -debugger_nodes 0
> 
> when not opening windows for each debugger it is best to have the first rank associated with the tty as the debugger node
> 
> 
> 
> 
> 
> 
>> On Feb 23, 2021, at 3:46 PM, Francesco Brarda <brardafrancesco at gmail.com> wrote:
>> 
>> Using the command you suggested I got 
>> 
>> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3
>> ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings)
>> method = optimize
>>   optimize
>>     algorithm = lbfgs (Default)
>>       lbfgs
>> method = optimize
>>   optimize
>>     algorithm = lbfgs (Default)
>>       lbfgs
>>         init_alpha = 0.001 (Default)
>>         tol_obj = 9.9999999999999998e-13 (Default)
>>         init_alpha = 0.001 (Default)
>>         tol_obj = 9.9999999999999998e-13 (Default)
>>         tol_rel_obj = 10000 (Default)
>>         tol_grad = 1e-08 (Default)        tol_rel_obj = 10000 (Default)
>>         tol_grad = 1e-08 (Default)
>>         tol_rel_grad = 10000000 (Default)
>> 
>>         tol_rel_grad = 10000000 (Default)
>>         tol_param = 1e-08 (Default)        tol_param = 1e-08 (Default)
>>         history_size = 5 (Default)
>>     iter = 2000 (Default)
>> 
>>         history_size = 5 (Default)
>>     iter = 2000 (Default)
>>     save_iterations = 0 (Default)
>> id = 0 (Default)
>> data
>>     save_iterations = 0 (Default)
>> id = 0 (Default)
>> data
>>   file =  (Default)
>>   file =  (Default)
>> init = 2 (Default)
>> random
>>   seed = 3623621468 (Default)
>> output
>>   file = output.csv (Default)init = 2 (Default)
>> random
>>   seed = 3623621468 (Default)
>> output
>>   file = output.csv (Default)
>> 
>>   diagnostic_file =  (Default)
>>   refresh = 100 (Default)
>> 
>>   diagnostic_file =  (Default)
>>   refresh = 100 (Default)
>> 
>> Initial log joint probability = -195.984
>>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
>>       10      -0.97101    0.00292919       1.65855       0.001       0.001       46  LS failed, Hessian reset 
>>       12     -0.483952      0.001316       1.18542       0.001       0.001       77  LS failed, Hessian reset 
>>       13     -0.477916     0.0118542      0.163518        0.01       0.001      106  LS failed, Hessian reset 
>> [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp
>> [1]PETSC ERROR: PETSc Option Table entries:
>> [1]PETSC ERROR: -debugger_nodes 3
>> [1]PETSC ERROR: -start_in_debugger noxterm
>> [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov?????
>>  
>> And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please?
>> I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean.
>> 
>> Thank you,
>> Francesco
>> 
>> 
>>> Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <knepley at gmail.com> ha scritto:
>>> 
>>> On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <brardafrancesco at gmail.com> wrote:
>>> Thank you for the quick response. 
>>> Sorry, you are right. Here is the complete output:
>>> 
>>> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
>>> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13
>>> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13
>>> xterm: Xt error: Can't open display: :0.0
>>> xterm: DISPLAY is not set
>>> xterm: Xt error: Can't open display: :0.0
>>> xterm: DISPLAY is not set
>>> 
>>> Do you have an Xserver running? If not, you can use
>>> 
>>>   -start_in_debugger noxterm -debugger_nodes 3
>>> 
>>> and try to get a stack trace from one node.
>>> 
>>>   Thanks,
>>> 
>>>     Matt
>>>  
>>> method = optimize
>>>   optimize
>>>     algorithm = lbfgs (Default)
>>>       lbfgs
>>> method = optimize
>>>   optimize
>>>     algorithm = lbfgs (Default)
>>>       lbfgs
>>>         init_alpha = 0.001 (Default)
>>>         tol_obj = 9.9999999999999998e-13 (Default)
>>>         tol_rel_obj = 10000 (Default)
>>>         tol_grad = 1e-08 (Default)
>>>         init_alpha = 0.001 (Default)
>>>         tol_obj = 9.9999999999999998e-13 (Default)
>>>         tol_rel_obj = 10000 (Default)
>>>         tol_grad = 1e-08 (Default)
>>>         tol_rel_grad = 10000000 (Default)
>>>         tol_param = 1e-08 (Default)
>>>         history_size = 5 (Default)
>>>         tol_rel_grad = 10000000 (Default)
>>>         tol_param = 1e-08 (Default)
>>>         history_size = 5 (Default)
>>>     iter = 2000 (Default)
>>>     iter = 2000 (Default)
>>>     save_iterations = 0 (Default)
>>> id = 0 (Default)
>>> data    save_iterations = 0 (Default)
>>> id = 0 (Default)
>>> data
>>>   file =  (Default)
>>> 
>>>   file =  (Default)
>>> init = 2 (Default)
>>> random
>>>   seed = 3585768430 (Default)
>>> init = 2 (Default)
>>> random
>>>   seed = 3585768430 (Default)
>>> output
>>>   file = output.csv (Default)
>>> output
>>>   file = output.csv (Default)
>>>   diagnostic_file =  (Default)
>>>   refresh = 100 (Default)
>>>   diagnostic_file =  (Default)
>>>   refresh = 100 (Default)
>>> 
>>> 
>>> Initial log joint probability = -731.444
>>>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
>>> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp  
>>>   To prevent termination, change the error handler using PetscPushErrorHandler()
>>> 
>>> ===================================================================================
>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> =   PID 47804 RUNNING AT srvulx13
>>> =   EXIT CODE: 134
>>> =   CLEANING UP REMAINING PROCESSES
>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>>> This typically refers to a problem with your application.
>>> Please see the FAQ page for debugging suggestions
>>> 
>>> 
>>> 
>>> 
>>> 
>>> The code inside main.cpp is the following:
>>> 
>>> #include <cmdstan/command.hpp>
>>> #include <stan/services/error_codes.hpp>
>>> 
>>> #include <petsc.h>
>>> 
>>> int main(int argc, char* argv[]) {
>>> 
>>>   PetscErrorCode ierr;
>>>   ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
>>> 
>>>   try {
>>>     ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
>>>   } catch (const std::exception& e) {
>>>     std::cout << e.what() << std::endl;
>>>     ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
>>>   }
>>> 
>>>   ierr = PetscFinalize();CHKERRQ(ierr);
>>>   return ierr;
>>> }
>>> 
>>> I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command.
>>> Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ?
>>> 
>>> Best,
>>> Francesco
>>> 
>>> 
>>>> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com> ha scritto:
>>>> 
>>>> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <brardafrancesco at gmail.com> wrote:
>>>> Hi!
>>>> 
>>>> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. 
>>>> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. 
>>>> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling.
>>>> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following:
>>>> 
>>>> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached,
>>>> you run the program (conr) and then when you hit the SEGV you get a stack trace (where).
>>>> 
>>>>   THanks,
>>>> 
>>>>     Matt
>>>>  
>>>> [2]PETSC ERROR: ------------------------------------------------------------------------
>>>> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>> [2]PETSC ERROR: likely location of problem given in stack below
>>>> [2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>>>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>>> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
>>>> [2]PETSC ERROR:       is given.
>>>> [3]PETSC ERROR: ------------------------------------------------------------------------
>>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>> [3]PETSC ERROR: likely location of problem given in stack below
>>>> [3]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
>>>> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>>> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
>>>> [3]PETSC ERROR:       is given.
>>>> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
>>>>   To prevent termination, change the error handler using PetscPushErrorHandler()
>>>> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
>>>>   To prevent termination, change the error handler using PetscPushErrorHandler()
>>>> 
>>>> ===================================================================================
>>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>> =   PID 22939 RUNNING AT srvulx13
>>>> =   EXIT CODE: 134
>>>> =   CLEANING UP REMAINING PROCESSES
>>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>> ===================================================================================
>>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>>>> This typically refers to a problem with your application.
>>>> Please see the FAQ page for debugging suggestions
>>>> 
>>>> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? 
>>>> I hope I have been clear enough.
>>>> Attached you can find also my configure.log and make.log files.
>>>> 
>>>> Best,
>>>> Francesco
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210224/7a67d216/attachment-0001.html>

From elena.travaglia at edu.unito.it  Wed Feb 24 04:51:48 2021
From: elena.travaglia at edu.unito.it (Elena Travaglia)
Date: Wed, 24 Feb 2021 11:51:48 +0100
Subject: [petsc-users] Preconditioner for LSC
In-Reply-To: <87sg5m9hvp.fsf@jedbrown.org>
References: <CA+-JvF53qcMb1CTZGCVtZcZ0HL8GJ0p5O00=2+_WUhh4ccUpGw@mail.gmail.com>
	<87sg5m9hvp.fsf@jedbrown.org>
Message-ID: <CA+-JvF50gyuieTN862GgkUwQ3kSvRZ1KkvD_ri46C+kSQPW=KQ@mail.gmail.com>

Thank you for the replay.

Now I have set the precondition on the command line with
"-fieldsplit_1_lsc_pc_type" , but is it also possible to set it from within
the code instead?
What is the equivalent code to obtain the effect of
"-fieldsplit_1_lsc_pc_type mat" ?

Elena

Il giorno mer 24 feb 2021 alle ore 06:04 Jed Brown <jed at jedbrown.org> ha
scritto:

> If you've already attached a MatShell, you could presumably use
> -fieldsplit_1_lsc_pc_type mat to just call its MatMult.
>
> The sense I've gotten when I wrote PCLSC and was experimenting with these
> methods is that the main selling point of LSC (for most discretizations) is
> that it's more algebraic than the cheaper PCD methods.
>
> Elena Travaglia <elena.travaglia at edu.unito.it> writes:
>
> > Dear PETSc users,
> >
> > we would like to compare our preconditioner for the Schur complement
> > of a Stokes system, with the LSC preconditioner already implemented in
> > PETSc. Following the example in the PETSc manual, we've tried
> >    -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type ml
> > but this is not working (properly) on our problem.
> >
> > On the other hand we think we have a good preconditioner for A10*A01,
> > so  we'd like to try
> >    -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type shell
> > but we cannot figure out how to attach our apply() routine to
> > the pc object of fieldsplit_1_lsc.
> >
> > Can this be done in the current interface?
> > Or perhaps, should we call KSPGetOperators on the fieldsplit_1 solver
> > and attach to its Sp operator a "LSC_Lp" of type MATSHELL with our
> routine
> > attached to the MATOP_SOLVE of the shell matrix?
> >
> > Thanks in advance,
> >
> > Elena and Matteo
> >
> > --
> > ------------------------
> >
> >
> >
> > Indirizzo istituzionale di posta elettronica
> > degli studenti e dei laureati dell'Universit? degli Studi di
> TorinoOfficial
> > University of Turin email address for students and graduates
>

-- 
------------------------


Indirizzo istituzionale di posta elettronica 
degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial?
University of Turin?email address?for students and graduates?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210224/cee9a6ce/attachment.html>

From knepley at gmail.com  Wed Feb 24 06:20:31 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 24 Feb 2021 07:20:31 -0500
Subject: [petsc-users] Caught signal number 11 SEGV
In-Reply-To: <FE530C11-D0A8-42CD-ACD3-C9A979EF81C7@gmail.com>
References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com>
	<CAMYG4G=4ZvVC=BZknDrF6a08MOdD+4NAEhvEy8deJ7wAu=8fcg@mail.gmail.com>
	<9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com>
	<CAMYG4GksaXHF=B+a9TP5GgEPn658prq6Y=f-gJQg_phBDBVxAw@mail.gmail.com>
	<5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com>
	<00FCB539-D690-4311-8821-DF81DA408FE7@petsc.dev>
	<FE530C11-D0A8-42CD-ACD3-C9A979EF81C7@gmail.com>
Message-ID: <CAMYG4Gmq2rx9VNXDm3MOViAhDP1PZ9WiFf_4BMDhrH1oFphjJw@mail.gmail.com>

You have shut off the ability to use the debugger on your machine:

Attaching to program:
/home/fbrarda/cmdstan-petsc/examples/rosenbrock/rosenbrock, process 3903
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.

You need to edit  /proc/sys/kernel/yama/ptrace_scope and allow debugging on
your box.

  Thanks,

     Matt

On Wed, Feb 24, 2021 at 4:39 AM Francesco Brarda <brardafrancesco at gmail.com>
wrote:

> I have never used gdb.
> Using 0 as you suggested I got this output:
>
> $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2
> examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm
> -debugger_nodes 0
> ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated
> as of version 3.14 and will be removed in a future release. Please use the
> option -debugger_ranks instead. (Silence this warning
> with -options_suppress_deprecated_warnings)
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 3903
> on srvulx13
> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
> Copyright (C) 2014 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from examples/rosenbrock/rosenbrock...done.
> Attaching to program:
> /home/fbrarda/cmdstan-petsc/examples/rosenbrock/rosenbrock, process 3903
> Could not attach to process.  If your uid matches the uid of the target
> process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
> again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
> ptrace: Operation not permitted.
> /home/fbrarda/cmdstan-petsc/3903: No such file or directory.
> (gdb) method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)        init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>         tol_grad = 1e-08 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)        tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
> init = 2 (Default)
>
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> random
>   seed = 3666155654 (Default)
> output
>   file = output.csv (Default)
>   diagnostic_file =  (Default)id = 0 (Default)
> data
>   file =  (Default)
> init = 2 (Default)
>
>   refresh = 100 (Default)
>
> random
>   seed = 3666155654 (Default)
> output
>   file = output.csv (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>
> Initial log joint probability = -158.559
>     Iter      log prob        ||dx||      ||grad||       alpha
>   alpha0  # evals  Notes
>       12     -0.253535   0.000284499     0.0383658       0.001
>   0.001       46  LS failed, Hessian reset
>       13     -0.253535   0.000284499     0.0383658       6.528
>   0.001      111  LS failed, Hessian reset
> Optimization terminated with error:
>   Line search failed to achieve a sufficient decrease, no more
> progress can be made
> [0]PETSC ERROR: PetscAbortErrorHandler: main() line 12
> in src/cmdstan/main.cpp
>   To prevent termination, change the error handler
> using PetscPushErrorHandler()
>
> Using only 1 process the code works.
>
> Francesco
>
> Il giorno 24 feb 2021, alle ore 07:14, Barry Smith <bsmith at petsc.dev> ha
> scritto:
>
>
> start_in_debugger noxterm -debugger_nodes 3
>
> Use -start_in_debugger noxterm -debugger_nodes 0
>
> when not opening windows for each debugger it is best to have the first
> rank associated with the tty as the debugger node
>
>
>
>
>
>
> On Feb 23, 2021, at 3:46 PM, Francesco Brarda <brardafrancesco at gmail.com>
> wrote:
>
> Using the command you suggested I got
>
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2
> examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm
> -debugger_nodes 3
> ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated
> as of version 3.14 and will be removed in a future release. Please use the
> option -debugger_ranks instead. (Silence this warning with
> -options_suppress_deprecated_warnings)
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)        tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
>
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)        tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
>   file =  (Default)
> init = 2 (Default)
> random
>   seed = 3623621468 (Default)
> output
>   file = output.csv (Default)init = 2 (Default)
> random
>   seed = 3623621468 (Default)
> output
>   file = output.csv (Default)
>
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>
> Initial log joint probability = -195.984
>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0
>  # evals  Notes
>       10      -0.97101    0.00292919       1.65855       0.001       0.001
>       46  LS failed, Hessian reset
>       12     -0.483952      0.001316       1.18542       0.001       0.001
>       77  LS failed, Hessian reset
>       13     -0.477916     0.0118542      0.163518        0.01       0.001
>      106  LS failed, Hessian reset
> [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp
> [1]PETSC ERROR: PETSc Option Table entries:
> [1]PETSC ERROR: -debugger_nodes 3
> [1]PETSC ERROR: -start_in_debugger noxterm
> [1]PETSC ERROR: ----------------End of Error Message -------send entire
> error message to petsc-maint at mcs.anl.gov?????
>
> And then it does not go further. With the -debugger_ranks suggested, the
> output is the same. What do you think, please?
> I am using a cluster (one node, dual-socket system with twelve-core-CPUs),
> but when I do the ssh I do not use the -X flag, if that's what you mean.
>
> Thank you,
> Francesco
>
>
> Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <knepley at gmail.com>
> ha scritto:
>
> On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <
> brardafrancesco at gmail.com> wrote:
> Thank you for the quick response.
> Sorry, you are right. Here is the complete output:
>
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2
> examples/rosenbrock/rosenbrock optimize -start_in_debugger
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on
> display :0.0 on machine srvulx13
> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on
> display :0.0 on machine srvulx13
> xterm: Xt error: Can't open display: :0.0
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display: :0.0
> xterm: DISPLAY is not set
>
> Do you have an Xserver running? If not, you can use
>
>   -start_in_debugger noxterm -debugger_nodes 3
>
> and try to get a stack trace from one node.
>
>   Thanks,
>
>     Matt
>
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data    save_iterations = 0 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
>
>   file =  (Default)
> init = 2 (Default)
> random
>   seed = 3585768430 (Default)
> init = 2 (Default)
> random
>   seed = 3585768430 (Default)
> output
>   file = output.csv (Default)
> output
>   file = output.csv (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
>
>
> Initial log joint probability = -731.444
>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0
>  # evals  Notes
> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in
> src/cmdstan/main.cpp
>   To prevent termination, change the error handler using
> PetscPushErrorHandler()
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 47804 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
>
>
>
>
> The code inside main.cpp is the following:
>
> #include <cmdstan/command.hpp>
> #include <stan/services/error_codes.hpp>
>
> #include <petsc.h>
>
> int main(int argc, char* argv[]) {
>
>   PetscErrorCode ierr;
>   ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
>
>   try {
>     ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
>   } catch (const std::exception& e) {
>     std::cout << e.what() << std::endl;
>     ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
>   }
>
>   ierr = PetscFinalize();CHKERRQ(ierr);
>   return ierr;
> }
>
> I highlighted the line 12. Although I read the page where the command
> PetscPushErrorHandler is explained and the example
> provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should
> effectively use the command.
> Should I change the entire try/catch with
> PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ?
>
> Best,
> Francesco
>
>
> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com>
> ha scritto:
>
> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <
> brardafrancesco at gmail.com> wrote:
> Hi!
>
> I am very new to the PETSc world. I am working with a GitHub repo that
> uses PETSc together with Stan (a statistics open source software), here you
> can find the discussion.
> It has been defined a functor to convert EigenVector to PetscVec and
> viceversa, both sequentially and in parallel.
> The file using these functions does the conversions with the sequential
> setting. I changed to those using MPI, that is
> from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because
> I want to evaluate the scaling.
> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock
> optimize in the debug mode I get the error Caught signal number 11 SEGV. I
> therefore used the option -start_in_debugger and I get the following:
>
> For some reason, the -start_in_debuggger option is not being seen. Are you
> showing all the output? Once the debugger is attached,
> you run the program (conr) and then when you hit the SEGV you get a stack
> trace (where).
>
>   THanks,
>
>     Matt
>
> [2]PETSC ERROR:
> ------------------------------------------------------------------------
> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [2]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [2]PETSC ERROR: likely location of problem given in stack below
> [2]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> [2]PETSC ERROR:       is given.
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [3]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [3]PETSC ERROR: likely location of problem given in stack below
> [3]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
> [3]PETSC ERROR:       is given.
> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in
>  unknown file (null)
>   To prevent termination, change the error handler using
> PetscPushErrorHandler()
> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in
>  unknown file (null)
>   To prevent termination, change the error handler using
> PetscPushErrorHandler()
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 22939 RUNNING AT srvulx13
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
> I read the documentation regarding the PetscAbortErrorHandler, but I do
> not know where should I use it. How can I solve the problem?
> I hope I have been clear enough.
> Attached you can find also my configure.log and make.log files.
>
> Best,
> Francesco
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
>
>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210224/1855cc10/attachment-0001.html>

From jed at jedbrown.org  Wed Feb 24 12:31:33 2021
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 24 Feb 2021 11:31:33 -0700
Subject: [petsc-users] Preconditioner for LSC
In-Reply-To: <CA+-JvF50gyuieTN862GgkUwQ3kSvRZ1KkvD_ri46C+kSQPW=KQ@mail.gmail.com>
References: <CA+-JvF53qcMb1CTZGCVtZcZ0HL8GJ0p5O00=2+_WUhh4ccUpGw@mail.gmail.com>
	<87sg5m9hvp.fsf@jedbrown.org>
	<CA+-JvF50gyuieTN862GgkUwQ3kSvRZ1KkvD_ri46C+kSQPW=KQ@mail.gmail.com>
Message-ID: <87k0qx9v22.fsf@jedbrown.org>

We should add a PCLSCGetKSP() interface to pull out that solver. It'll be somewhat hard to use because the PC needs to be set up before that part would exist. (This is a recurring interface challenge for nested solvers and we don't have a good solution. Options are convenient.)

Elena Travaglia <elena.travaglia at edu.unito.it> writes:

> Thank you for the replay.
>
> Now I have set the precondition on the command line with
> "-fieldsplit_1_lsc_pc_type" , but is it also possible to set it from within
> the code instead?
> What is the equivalent code to obtain the effect of
> "-fieldsplit_1_lsc_pc_type mat" ?
>
> Elena
>
> Il giorno mer 24 feb 2021 alle ore 06:04 Jed Brown <jed at jedbrown.org> ha
> scritto:
>
>> If you've already attached a MatShell, you could presumably use
>> -fieldsplit_1_lsc_pc_type mat to just call its MatMult.
>>
>> The sense I've gotten when I wrote PCLSC and was experimenting with these
>> methods is that the main selling point of LSC (for most discretizations) is
>> that it's more algebraic than the cheaper PCD methods.
>>
>> Elena Travaglia <elena.travaglia at edu.unito.it> writes:
>>
>> > Dear PETSc users,
>> >
>> > we would like to compare our preconditioner for the Schur complement
>> > of a Stokes system, with the LSC preconditioner already implemented in
>> > PETSc. Following the example in the PETSc manual, we've tried
>> >    -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type ml
>> > but this is not working (properly) on our problem.
>> >
>> > On the other hand we think we have a good preconditioner for A10*A01,
>> > so  we'd like to try
>> >    -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type shell
>> > but we cannot figure out how to attach our apply() routine to
>> > the pc object of fieldsplit_1_lsc.
>> >
>> > Can this be done in the current interface?
>> > Or perhaps, should we call KSPGetOperators on the fieldsplit_1 solver
>> > and attach to its Sp operator a "LSC_Lp" of type MATSHELL with our
>> routine
>> > attached to the MATOP_SOLVE of the shell matrix?
>> >
>> > Thanks in advance,
>> >
>> > Elena and Matteo
>> >
>> > --
>> > ------------------------
>> >
>> >
>> >
>> > Indirizzo istituzionale di posta elettronica
>> > degli studenti e dei laureati dell'Universit? degli Studi di
>> TorinoOfficial
>> > University of Turin email address for students and graduates
>>
>
> -- 
> ------------------------
>
>
>
> Indirizzo istituzionale di posta elettronica 
> degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial?
> University of Turin?email address?for students and graduates?

From smithc11 at rpi.edu  Thu Feb 25 15:57:47 2021
From: smithc11 at rpi.edu (Cameron Smith)
Date: Thu, 25 Feb 2021 16:57:47 -0500
Subject: [petsc-users] creation of parallel dmplex from a partitioned
 mesh
In-Reply-To: <d73f6020-2cd2-4bcd-875e-0ff07e5657e8@rpi.edu>
References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu>
	<CAMYG4GmzdqU1dDj4KHhMq2oftEwND5KMVWx39tbdNktp8EAhUw@mail.gmail.com>
	<62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu>
	<3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu>
	<87mu2jg57a.fsf@jedbrown.org>
	<CAMYG4GnZQerLO62MiJs=9Yyu3pyZP14S1XRvouvmQ25_NCAQvA@mail.gmail.com>
	<d73f6020-2cd2-4bcd-875e-0ff07e5657e8@rpi.edu>
Message-ID: <5e245665-61c6-3a48-9b3e-97b38f69829e@rpi.edu>

Hello,

Bringing this thread back from the dead...

We made progress with creation of a distributed dmplex that matches our 
source mesh partition and are in need of help writing values into a 
vector created from the dmplex object.

As discussed previously, we have created a DMPlex instance using:

DMPlexCreateFromCellListPetsc(...)
DMGetPointSF(...)
PetscSFSetGraph(...)

which gives us a distribution of mesh vertices and elements in the DM 
object that matches the element-based partition of our unstructured mesh.

We then mark mesh vertices on the geometric model boundary using 
DMSetLabelValue(...) and a map from our mesh vertices to dmplex points 
(created during dmplex definition of vtx coordinates).

Following this, we create a section for vertices:

>     DMPlexGetDepthStratum(dm, 0, &vStart, &vEnd);                                                                                                                                                                                                                                 
>     PetscSectionCreate(PetscObjectComm((PetscObject) dm), &s);
>     PetscSectionSetNumFields(s, 1);
>     PetscSectionSetFieldComponents(s, 0, 1);
>     PetscSectionSetChart(s, vStart, vEnd);
>     for(PetscInt v = vStart; v < vEnd; ++v) {
>         PetscSectionSetDof(s, v, 1);
>         PetscSectionSetFieldDof(s, v, 0, 1);
>     }
>     PetscSectionSetUp(s);
>     DMSetLocalSection(dm, s);
>     PetscSectionDestroy(&s);
>     DMGetGlobalSection(dm,&s); //update the global section

We then try to write values into a local Vec for the on-process vertices 
(roots and leaves in sf terms) and hit an ordering problem. 
Specifically, we make the following sequence of calls:

DMGetLocalVector(dm,&bloc);
VecGetArrayWrite(bloc, &bwrite);
//for loop to write values to bwrite
VecRestoreArrayWrite(bloc, &bwrite);
DMLocalToGlobal(dm,bloc,INSERT_VALUES,b);
DMRestoreLocalVector(dm,&bloc);

Visualizing Vec 'b' in paraview, and the
original mesh, tells us that the dmplex topology and geometry (the 
vertex coordinates) are correct but that the order we write values is 
wrong (not total garbage... but clearly shifted).

Is there anything obviously wrong in our described approach?  I suspect 
the section creation is wrong and/or we don't understand the order of 
entries in the array returned by VecGetArrayWrite.

Please let us know if other info is needed.  We are happy to share the 
relevant source code.

Thank-you,
Cameron


On 8/25/20 8:34 AM, Cameron Smith wrote:
> On 8/24/20 4:57 PM, Matthew Knepley wrote:
>> On Mon, Aug 24, 2020 at 4:27 PM Jed Brown <jed at jedbrown.org 
>> <mailto:jed at jedbrown.org>> wrote:
>>
>> ??? Cameron Smith <smithc11 at rpi.edu <mailto:smithc11 at rpi.edu>> writes:
>>
>> ???? > We made some progress with star forest creation but still have
>> ??? work to do.
>> ???? >
>> ???? > We revisited DMPlexCreateFromCellListParallelPetsc(...) and got it
>> ???? > working by sequentially partitioning the vertex coordinates across
>> ???? > processes to satisfy the 'vertexCoords' argument. Specifically,
>> ??? rank 0
>> ???? > has the coordinates for vertices with global id 0:N/P-1, rank 1 
>> has
>> ???? > N/P:2*(N/P)-1, and so on (N is the total number of global
>> ??? vertices and P
>> ???? > is the number of processes).
>> ???? >
>> ???? > The consequences of the sequential partition of vertex
>> ??? coordinates in
>> ???? > subsequent solver operations is not clear.? Does it make process i
>> ???? > responsible for computations and communications associated with
>> ??? global
>> ???? > vertices i*(N/P):(i+1)*(N/P)-1 ?? We assumed it does and wanted
>> ??? to confirm.
>>
>> ??? Yeah, in the sense that the corners would be owned by the rank you
>> ??? place them on.
>>
>> ??? But many methods, especially high-order, perform assembly via
>> ??? non-overlapping partition of elements, in which case the
>> ??? "computations" happen where the elements are (with any required
>> ??? vertex data for the closure of those elements being sent to the rank
>> ??? handling the element).
>>
>> ??? Note that a typical pattern would be to create a parallel DMPlex
>> ??? with a naive distribution, then repartition/distribute it.
>>
>>
>> As Jed says, CreateParallel() just makes the most naive partition of 
>> vertices because we have no other information. Once
>> the mesh is made, you call DMPlexDistribute() again to reduce the edge 
>> cut.
>>
>> ?? Thanks,
>>
>> ?? ? ?Matt
>>
> 
> 
> Thank you.
> 
> This is being used for PIC code with low order 2d elements whose mesh is 
> partitioned to minimize communications during particle operations.? This 
> partition will not be ideal for the field solve using petsc so we're 
> exploring alternatives that will require minimal data movement between 
> the two partitions.? Towards that, we'll keep pursuing the SF creation.
> 
> -Cameron
> 

From lisandro.verga.bega at gmail.com  Fri Feb 26 02:56:55 2021
From: lisandro.verga.bega at gmail.com (Lisandro Verga)
Date: Fri, 26 Feb 2021 00:56:55 -0800
Subject: [petsc-users] re Example finite volume silver in PETSc
In-Reply-To: <CAOHboJ_g61LTEWPGybfk+Ayw8T5wU53NgvE+Km_mwwLS-HHDTw@mail.gmail.com>
References: <CAOHboJ_g61LTEWPGybfk+Ayw8T5wU53NgvE+Km_mwwLS-HHDTw@mail.gmail.com>
Message-ID: <CAB-=f8yL5nVSnj5PCycLQzcGyjZ9d-Z05g5bmJAURVGiWRYEWw@mail.gmail.com>

Dear All,

Thank you.

Best Regards,

On Monday, February 22, 2021, Ed Bueler <elbueler at alaska.edu> wrote:

> A very basic 2D FV example, a scalar advection solver, using PETSc DMDA,
> is at
>    https://github.com/bueler/p4pdes/blob/master/c/ch11/advect.c
> and documented in Chapter 11 of my book (https://my.siam.org/Store/
> Product/viewproduct/?ProductId=32850137).  This example might be most
> useful to you if you are interested in implementing flux limiters.
>
> Ed
>
> > Dear PETSc Team,
> >
> > I would like to ask you if there a finite volume solver build using the
> > PETSc data structure. I have found several manuscripts or presentations
> > that mention that but I cannot retrieve an example it.
> >
> > Thank you.
> >
> > Regards,
>
> --
> Ed Bueler
> Dept of Mathematics and Statistics
> University of Alaska Fairbanks
> Fairbanks, AK 99775-6660
> 306C Chapman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210226/0409e661/attachment.html>

From knepley at gmail.com  Fri Feb 26 07:32:37 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 26 Feb 2021 08:32:37 -0500
Subject: [petsc-users] creation of parallel dmplex from a partitioned
 mesh
In-Reply-To: <5e245665-61c6-3a48-9b3e-97b38f69829e@rpi.edu>
References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu>
	<CAMYG4GmzdqU1dDj4KHhMq2oftEwND5KMVWx39tbdNktp8EAhUw@mail.gmail.com>
	<62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu>
	<3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu>
	<87mu2jg57a.fsf@jedbrown.org>
	<CAMYG4GnZQerLO62MiJs=9Yyu3pyZP14S1XRvouvmQ25_NCAQvA@mail.gmail.com>
	<d73f6020-2cd2-4bcd-875e-0ff07e5657e8@rpi.edu>
	<5e245665-61c6-3a48-9b3e-97b38f69829e@rpi.edu>
Message-ID: <CAMYG4GmtqZOtUySvL_av2Qw34X5oOve3oj3S6qcv0CcAy7WWTg@mail.gmail.com>

On Thu, Feb 25, 2021 at 4:57 PM Cameron Smith <smithc11 at rpi.edu> wrote:

> Hello,
>
> Bringing this thread back from the dead...
>
> We made progress with creation of a distributed dmplex that matches our
> source mesh partition and are in need of help writing values into a
> vector created from the dmplex object.
>
> As discussed previously, we have created a DMPlex instance using:
>
> DMPlexCreateFromCellListPetsc(...)
> DMGetPointSF(...)
> PetscSFSetGraph(...)
>
> which gives us a distribution of mesh vertices and elements in the DM
> object that matches the element-based partition of our unstructured mesh.
>
> We then mark mesh vertices on the geometric model boundary using
> DMSetLabelValue(...) and a map from our mesh vertices to dmplex points
> (created during dmplex definition of vtx coordinates).
>
> Following this, we create a section for vertices:
>
> >     DMPlexGetDepthStratum(dm, 0, &vStart, &vEnd);
>
>
>
> >     PetscSectionCreate(PetscObjectComm((PetscObject) dm), &s);
> >     PetscSectionSetNumFields(s, 1);
> >     PetscSectionSetFieldComponents(s, 0, 1);
> >     PetscSectionSetChart(s, vStart, vEnd);
> >     for(PetscInt v = vStart; v < vEnd; ++v) {
> >         PetscSectionSetDof(s, v, 1);
> >         PetscSectionSetFieldDof(s, v, 0, 1);
> >     }
> >     PetscSectionSetUp(s);
> >     DMSetLocalSection(dm, s);
> >     PetscSectionDestroy(&s);
> >     DMGetGlobalSection(dm,&s); //update the global section
>
> We then try to write values into a local Vec for the on-process vertices
> (roots and leaves in sf terms) and hit an ordering problem.
> Specifically, we make the following sequence of calls:
>
> DMGetLocalVector(dm,&bloc);
> VecGetArrayWrite(bloc, &bwrite);
> //for loop to write values to bwrite
> VecRestoreArrayWrite(bloc, &bwrite);
> DMLocalToGlobal(dm,bloc,INSERT_VALUES,b);
> DMRestoreLocalVector(dm,&bloc);
>

There is an easy way to get diagnostics here. For the local vector

  DMGetLocalSection(dm, &s);
  PetscSectionGetOffset(s, v, &off);

will give you the offset into the array you got from VecGetArrayWrite()
for that vertex. You can get this wrapped up using DMPlexPointLocalWrite()
which is what I tend to use for this type of stuff.

For the global vector

  DMGetGlobalSection(dm, &gs);
  PetscSectionGetOffset(gs, v, &off);

will give you the offset into the portion of the global array that is
stored in this process.
If you do not own the values for this vertex, the number is negative, and
it is actually
-(i+1) if the index i is the valid one on the owning process.


> Visualizing Vec 'b' in paraview, and the
> original mesh, tells us that the dmplex topology and geometry (the
> vertex coordinates) are correct but that the order we write values is
> wrong (not total garbage... but clearly shifted).
>

We do not make any guarantees that global orders match local orders.
However, by default
we number up global unknowns in rank order, leaving out the dofs that we
not owned.

Does this make sense?

  Thanks,

     Matt


> Is there anything obviously wrong in our described approach?  I suspect
> the section creation is wrong and/or we don't understand the order of
> entries in the array returned by VecGetArrayWrite.
>
> Please let us know if other info is needed.  We are happy to share the
> relevant source code.
>
> Thank-you,
> Cameron
>
>
> On 8/25/20 8:34 AM, Cameron Smith wrote:
> > On 8/24/20 4:57 PM, Matthew Knepley wrote:
> >> On Mon, Aug 24, 2020 at 4:27 PM Jed Brown <jed at jedbrown.org
> >> <mailto:jed at jedbrown.org>> wrote:
> >>
> >>     Cameron Smith <smithc11 at rpi.edu <mailto:smithc11 at rpi.edu>> writes:
> >>
> >>      > We made some progress with star forest creation but still have
> >>     work to do.
> >>      >
> >>      > We revisited DMPlexCreateFromCellListParallelPetsc(...) and got
> it
> >>      > working by sequentially partitioning the vertex coordinates
> across
> >>      > processes to satisfy the 'vertexCoords' argument. Specifically,
> >>     rank 0
> >>      > has the coordinates for vertices with global id 0:N/P-1, rank 1
> >> has
> >>      > N/P:2*(N/P)-1, and so on (N is the total number of global
> >>     vertices and P
> >>      > is the number of processes).
> >>      >
> >>      > The consequences of the sequential partition of vertex
> >>     coordinates in
> >>      > subsequent solver operations is not clear.  Does it make process
> i
> >>      > responsible for computations and communications associated with
> >>     global
> >>      > vertices i*(N/P):(i+1)*(N/P)-1 ?  We assumed it does and wanted
> >>     to confirm.
> >>
> >>     Yeah, in the sense that the corners would be owned by the rank you
> >>     place them on.
> >>
> >>     But many methods, especially high-order, perform assembly via
> >>     non-overlapping partition of elements, in which case the
> >>     "computations" happen where the elements are (with any required
> >>     vertex data for the closure of those elements being sent to the rank
> >>     handling the element).
> >>
> >>     Note that a typical pattern would be to create a parallel DMPlex
> >>     with a naive distribution, then repartition/distribute it.
> >>
> >>
> >> As Jed says, CreateParallel() just makes the most naive partition of
> >> vertices because we have no other information. Once
> >> the mesh is made, you call DMPlexDistribute() again to reduce the edge
> >> cut.
> >>
> >>    Thanks,
> >>
> >>       Matt
> >>
> >
> >
> > Thank you.
> >
> > This is being used for PIC code with low order 2d elements whose mesh is
> > partitioned to minimize communications during particle operations.  This
> > partition will not be ideal for the field solve using petsc so we're
> > exploring alternatives that will require minimal data movement between
> > the two partitions.  Towards that, we'll keep pursuing the SF creation.
> >
> > -Cameron
> >
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210226/614ebab6/attachment-0001.html>

From patrick.sanan at gmail.com  Fri Feb 26 11:01:56 2021
From: patrick.sanan at gmail.com (Patrick Sanan)
Date: Fri, 26 Feb 2021 18:01:56 +0100
Subject: [petsc-users] [petsc-dev] headsup: switch git default branch
 from 'master' to 'main'
In-Reply-To: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov>
References: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov>
Message-ID: <E263BDC1-B8B0-48AC-B1EC-24DED5C7597E@gmail.com>

The answers to these were probably stated already, but the reminder might be useful to others, as well.

What will happen to "master" after today? Will it be deleted immediately or at some planned time? If not immediately deleted, will it be updated to match main?

> Am 23.02.2021 um 18:19 schrieb Satish Balay via petsc-dev <petsc-dev at mcs.anl.gov>:
> 
> All,
> 
> This is a heads-up, we are to switch the default branch in petsc git
> repo from 'master' to 'main'
> 
> [Will plan to do the switch on friday the 26th]
> 
> We've previously switched 'maint' branch to 'release' before 3.14
> release - and this change (to 'main') is the next step in this direction.
> 
> Satish
> 


From balay at mcs.anl.gov  Fri Feb 26 11:06:39 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 26 Feb 2021 11:06:39 -0600
Subject: [petsc-users] [petsc-dev] headsup: switch git default branch
 from 'master' to 'main'
In-Reply-To: <E263BDC1-B8B0-48AC-B1EC-24DED5C7597E@gmail.com>
References: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov>
	<E263BDC1-B8B0-48AC-B1EC-24DED5C7597E@gmail.com>
Message-ID: <5e90c038-76e8-bd34-ad93-1a2bf8b2a4a@mcs.anl.gov>

I plan to delete 'master' immediately - so that folk don't assume it still exits and work with it [assuming its the latest, creating MRs against it etc..]

Satish

On Fri, 26 Feb 2021, Patrick Sanan wrote:

> The answers to these were probably stated already, but the reminder might be useful to others, as well.
> 
> What will happen to "master" after today? Will it be deleted immediately or at some planned time? If not immediately deleted, will it be updated to match main?
> 
> > Am 23.02.2021 um 18:19 schrieb Satish Balay via petsc-dev <petsc-dev at mcs.anl.gov>:
> > 
> > All,
> > 
> > This is a heads-up, we are to switch the default branch in petsc git
> > repo from 'master' to 'main'
> > 
> > [Will plan to do the switch on friday the 26th]
> > 
> > We've previously switched 'maint' branch to 'release' before 3.14
> > release - and this change (to 'main') is the next step in this direction.
> > 
> > Satish
> > 
> 


From balay at mcs.anl.gov  Fri Feb 26 15:50:40 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 26 Feb 2021 15:50:40 -0600
Subject: [petsc-users] [petsc-dev] headsup: switch git default branch
 from 'master' to 'main'
In-Reply-To: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov>
References: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov>
Message-ID: <4834e30-b876-d6b5-9b8a-1a2396efef7@mcs.anl.gov>

Update:

the switch (at gitlab.com/petsc/petsc) is done.

Please delete your local copy of 'master' branch and start using 'main' branch.

Satish

On Tue, 23 Feb 2021, Satish Balay via petsc-dev wrote:

> All,
> 
> This is a heads-up, we are to switch the default branch in petsc git
> repo from 'master' to 'main'
> 
> [Will plan to do the switch on friday the 26th]
> 
> We've previously switched 'maint' branch to 'release' before 3.14
> release - and this change (to 'main') is the next step in this direction.
> 
> Satish
>