From wence at gmx.li Mon Feb 1 08:38:55 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Mon, 1 Feb 2021 14:38:55 +0000 Subject: [petsc-users] Faculty positions at Durham Message-ID: Dear PETSc-ites, The Durham CS department is presently hiring. We have open positions at all levels (Assistant, Associate, and Full Professor) across a broad range of applied computer science, and we'd like to make at least one hire in Scientific Computing (the group currently has interests in geometric machine learning, finite element methods, preconditioning, and high performance algorithms: see https://duscicomp.github.io for more details). Assistant Prof: https://www.jobs.ac.uk/job/CDN843/assistant-professor-in-computer-science-comp21-51 Closing date 22nd February Associate Prof: https://www.jobs.ac.uk/job/CDN924/associate-professor-in-computer-science-comp21-57 Closing date 8th March Prof: https://www.jobs.ac.uk/job/CDN929/professor-in-computer-science-comp21-60 Closing date 22nd March If you have any queries or would like to know more, please get in touch with me. Thanks, Lawrence From uphoff at geophysik.uni-muenchen.de Tue Feb 2 05:06:16 2021 From: uphoff at geophysik.uni-muenchen.de (Carsten Uphoff) Date: Tue, 2 Feb 2021 12:06:16 +0100 Subject: [petsc-users] Discontinuous Galerkin and BDDC Message-ID: Hi everyone, I'm interested in testing a BDDC preconditioner for Poisson and Elasticity equations using the symmetric interior penalty Galerkin method. However, I wonder how one would apply PCBDDC for discontinuous Galerkin. The major problem is that for DG you cannot write the bilinear form as a sum of local bilinear forms which only involve degrees of freedom of the respective local subdomain. In particular, coupling terms at the interface of two subdomains, such as [[u]], require DOFs from two subdomains. Therefore, it is not straightforward to write the operator A as sum of local operators in the form A = sum_{i=1}^N R_i A_i R_i^T where A_i are local operators and R_i is the local-to-global map. In the literature, I found two possible solutions: - Double degrees of freedom at the subdomain interface [1] - Split the bilinear form a_h in the two parts a_{h,D} and a_{h,C}, where the first leads to an easy-to-invert operator that is discontinuous across the subdomain interface, and the second is continuous across the subdomain interface. As a_{h,C} is continuous, one may write the bilinear form as sum of local bilinear forms only involving the local degrees of freedom [2] The first approach [1] seems unattractive as you double the DOFs in the Schur complement. For [2] I think one might be able to apply PCBDDC on A_{h,C} and apply A_{h,D}^{-1} as an additive correction, cf. (2.23) in [2]. Questions: - Is there any straightforward way to apply PCBDDC for DG which I am missing? - Does it make sense to apply PCBDDC on A_{h,C}? Could I combine an additive correction with PCBDDC using PCCOMPOSITE, e.g.? - Does anyone already test PCBDDC for DG? I appreciate your help and I'm looking forward for your comments! Best regards, Carsten [1] Dryja and Galvis and Sarkis, Numer. Math. 131:737-770, 2015, doi:10.1007/s00211-015-0705-x [2] Brenner and Park and Sung, ETNA 46:190-214, 2017, http://etna.mcs.kent.edu/vol.46.2017/pp190-214.dir/pp190-214.pdf From mfadams at lbl.gov Tue Feb 2 14:17:52 2021 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 2 Feb 2021 15:17:52 -0500 Subject: [petsc-users] Fortran initialization and XXXDestroy Message-ID: Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed. As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat. What is the correct way to delete with initializing Fortran objects? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Tue Feb 2 14:27:51 2021 From: rlmackie862 at gmail.com (Randall Mackie) Date: Tue, 2 Feb 2021 12:27:51 -0800 Subject: [petsc-users] Fortran initialization and XXXDestroy In-Reply-To: References: Message-ID: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com> Hi Mark, I don?t know what the XGC code is, but the way I do this in my Fortran code is that I initialize all objects I later want to destroy, for example: mat11=PETSC_NULL_MAT vec1=PETSC_NULL_VEC etc Then I check and destroy like: if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr) etc. Hope this helps, Randy > On Feb 2, 2021, at 12:17 PM, Mark Adams wrote: > > Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed. > > As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat. > > What is the correct way to delete with initializing Fortran objects? > > Thanks, > Mark > > From mfadams at lbl.gov Tue Feb 2 14:34:44 2021 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 2 Feb 2021 15:34:44 -0500 Subject: [petsc-users] Fortran initialization and XXXDestroy In-Reply-To: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com> References: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com> Message-ID: Thanks Randy, that makes sense. Mark On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie wrote: > Hi Mark, > > I don?t know what the XGC code is, but the way I do this in my Fortran > code is that I initialize all objects I later want to destroy, for example: > > mat11=PETSC_NULL_MAT > vec1=PETSC_NULL_VEC > > etc > > Then I check and destroy like: > > if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr) > > etc. > > Hope this helps, > > Randy > > > > On Feb 2, 2021, at 12:17 PM, Mark Adams wrote: > > > > Satish, a few years ago you helped us transition the XGC Fortran code > from v3.7.7 and we seemed to have regressed. > > > > As I recall we removed the initialization of Mats (for example) in XGC. > PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified > this today) and I recall that from our previous conversation. As I look at > the code now Fortran MatDestroy just goes straight to C, which would > explain our crashes when we MatDestroy an uninitialized (-2) Mat. > > > > What is the correct way to delete with initializing Fortran objects? > > > > Thanks, > > Mark > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Feb 2 20:37:00 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 2 Feb 2021 20:37:00 -0600 Subject: [petsc-users] Fortran initialization and XXXDestroy In-Reply-To: References: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com> Message-ID: <3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev> I cannot remember why I selected -2 as the initial value for PETSc objects in Fortran. Probably because it would ensure a dramatic crash if you used an object without initializing it from Fortran. It could be changing config/BuildSystem/config/compilersFortran.py: self.addDefine('FORTRAN_TYPE_INITIALIZE', ' = -2') to 0 would mean that if you called destroy on the object and never created it everything would be fine; so you would not need to use any special code to check. Would that be a better model? Barry > On Feb 2, 2021, at 2:34 PM, Mark Adams wrote: > > Thanks Randy, that makes sense. > Mark > > On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie > wrote: > Hi Mark, > > I don?t know what the XGC code is, but the way I do this in my Fortran code is that I initialize all objects I later want to destroy, for example: > > mat11=PETSC_NULL_MAT > vec1=PETSC_NULL_VEC > > etc > > Then I check and destroy like: > > if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr) > > etc. > > Hope this helps, > > Randy > > > > On Feb 2, 2021, at 12:17 PM, Mark Adams > wrote: > > > > Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed. > > > > As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat. > > > > What is the correct way to delete with initializing Fortran objects? > > > > Thanks, > > Mark > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Feb 2 22:31:16 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 2 Feb 2021 22:31:16 -0600 Subject: [petsc-users] Fortran initialization and XXXDestroy In-Reply-To: <3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev> References: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com> <3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev> Message-ID: I think the current code is to support: if (var not initialized): crate(var). And there might be some mapping of 0 to NULL or some other state (hence the choice of -2) Satish On Tue, 2 Feb 2021, Barry Smith wrote: > > I cannot remember why I selected -2 as the initial value for PETSc objects in Fortran. Probably because it would ensure a dramatic crash if you used > an object without initializing it from Fortran. > > It could be changing > > config/BuildSystem/config/compilersFortran.py: self.addDefine('FORTRAN_TYPE_INITIALIZE', ' = -2') > > to 0 would mean that if you called destroy on the object and never created it everything would be fine; so you would not need to use any special code to check. > > Would that be a better model? > > Barry > > > > > On Feb 2, 2021, at 2:34 PM, Mark Adams wrote: > > > > Thanks Randy, that makes sense. > > Mark > > > > On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie > wrote: > > Hi Mark, > > > > I don?t know what the XGC code is, but the way I do this in my Fortran code is that I initialize all objects I later want to destroy, for example: > > > > mat11=PETSC_NULL_MAT > > vec1=PETSC_NULL_VEC > > > > etc > > > > Then I check and destroy like: > > > > if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr) > > > > etc. > > > > Hope this helps, > > > > Randy > > > > > > > On Feb 2, 2021, at 12:17 PM, Mark Adams > wrote: > > > > > > Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed. > > > > > > As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat. > > > > > > What is the correct way to delete with initializing Fortran objects? > > > > > > Thanks, > > > Mark > > > > > > > > > > From stefano.zampini at gmail.com Wed Feb 3 02:48:44 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 3 Feb 2021 11:48:44 +0300 Subject: [petsc-users] Discontinuous Galerkin and BDDC In-Reply-To: References: Message-ID: Il giorno mar 2 feb 2021 alle ore 14:06 Carsten Uphoff < uphoff at geophysik.uni-muenchen.de> ha scritto: > Hi everyone, > > I'm interested in testing a BDDC preconditioner for Poisson and > Elasticity equations using the symmetric interior penalty Galerkin > method. However, I wonder how one would apply PCBDDC for discontinuous > Galerkin. > > The major problem is that for DG you cannot write the bilinear form as a > sum of local bilinear forms which only involve degrees of freedom of the > respective local subdomain. In particular, coupling terms at the > interface of two subdomains, such as [[u]], require DOFs from two > subdomains. Therefore, it is not straightforward to write the operator A > as sum of local operators in the form > A = sum_{i=1}^N R_i A_i R_i^T > where A_i are local operators and R_i is the local-to-global map. > > In the literature, I found two possible solutions: > - Double degrees of freedom at the subdomain interface [1] > - Split the bilinear form a_h in the two parts a_{h,D} and a_{h,C}, > where the first leads to an easy-to-invert operator that is > discontinuous across the subdomain interface, and the second is > continuous across the subdomain interface. As a_{h,C} is continuous, one > may write the bilinear form as sum of local bilinear forms only > involving the local degrees of freedom [2] > > The first approach [1] seems unattractive as you double the DOFs in the > Schur complement. For [2] I think one might be able to apply PCBDDC on > A_{h,C} and apply A_{h,D}^{-1} as an additive correction, cf. (2.23) in > [2]. > > Questions: > - Is there any straightforward way to apply PCBDDC for DG which I am > missing? > I don't think so. I know Lawrence gave it some thoughts but never heard about a final solution about how to represent subdomain DG matrices via a MATIS. > > - Does it make sense to apply PCBDDC on A_{h,C}? Could I combine an > additive correction with PCBDDC using PCCOMPOSITE, e.g.? > You either use PCComposite or write a small PCSHELL that implements PCApply as additive combination > - Does anyone already test PCBDDC for DG? > Not that I know > I appreciate your help and I'm looking forward for your comments! > > Best regards, > Carsten > > [1] Dryja and Galvis and Sarkis, Numer. Math. 131:737-770, 2015, > doi:10.1007/s00211-015-0705-x > [2] Brenner and Park and Sung, ETNA 46:190-214, 2017, > http://etna.mcs.kent.edu/vol.46.2017/pp190-214.dir/pp190-214.pdf > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Feb 3 06:07:44 2021 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 3 Feb 2021 07:07:44 -0500 Subject: [petsc-users] Fortran initialization and XXXDestroy In-Reply-To: References: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com> <3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev> Message-ID: Perhaps a custom Fortran interface that checks for FORTRAN_TYPE_INITIALIZE. That was the first thing that I looked for. On Tue, Feb 2, 2021 at 11:31 PM Satish Balay wrote: > I think the current code is to support: > > if (var not initialized): crate(var). > > And there might be some mapping of 0 to NULL or some other state (hence > the choice of -2) > > Satish > > On Tue, 2 Feb 2021, Barry Smith wrote: > > > > > I cannot remember why I selected -2 as the initial value for PETSc > objects in Fortran. Probably because it would ensure a dramatic crash if > you used > > an object without initializing it from Fortran. > > > > It could be changing > > > > config/BuildSystem/config/compilersFortran.py: > self.addDefine('FORTRAN_TYPE_INITIALIZE', ' = -2') > > > > to 0 would mean that if you called destroy on the object and never > created it everything would be fine; so you would not need to use any > special code to check. > > > > Would that be a better model? > > > > Barry > > > > > > > > > On Feb 2, 2021, at 2:34 PM, Mark Adams wrote: > > > > > > Thanks Randy, that makes sense. > > > Mark > > > > > > On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie > wrote: > > > Hi Mark, > > > > > > I don?t know what the XGC code is, but the way I do this in my Fortran > code is that I initialize all objects I later want to destroy, for example: > > > > > > mat11=PETSC_NULL_MAT > > > vec1=PETSC_NULL_VEC > > > > > > etc > > > > > > Then I check and destroy like: > > > > > > if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr) > > > > > > etc. > > > > > > Hope this helps, > > > > > > Randy > > > > > > > > > > On Feb 2, 2021, at 12:17 PM, Mark Adams mfadams at lbl.gov>> wrote: > > > > > > > > Satish, a few years ago you helped us transition the XGC Fortran > code from v3.7.7 and we seemed to have regressed. > > > > > > > > As I recall we removed the initialization of Mats (for example) in > XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, > verified this today) and I recall that from our previous conversation. As I > look at the code now Fortran MatDestroy just goes straight to C, which > would explain our crashes when we MatDestroy an uninitialized (-2) Mat. > > > > > > > > What is the correct way to delete with initializing Fortran objects? > > > > > > > > Thanks, > > > > Mark > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Feb 3 10:14:19 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 03 Feb 2021 09:14:19 -0700 Subject: [petsc-users] Discontinuous Galerkin and BDDC In-Reply-To: References: Message-ID: <87pn1huo10.fsf@jedbrown.org> Stefano Zampini writes: > Il giorno mar 2 feb 2021 alle ore 14:06 Carsten Uphoff < > uphoff at geophysik.uni-muenchen.de> ha scritto: >> Questions: >> - Is there any straightforward way to apply PCBDDC for DG which I am >> missing? >> > > I don't think so. I know Lawrence gave it some thoughts but never heard > about a final solution about how to represent subdomain DG matrices via a > MATIS. Wouldn't hybridizing work pretty naturally? The dofs are all in a broken trace space and connect through elements with shared support. From wence at gmx.li Wed Feb 3 10:24:49 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Wed, 3 Feb 2021 16:24:49 +0000 Subject: [petsc-users] Discontinuous Galerkin and BDDC In-Reply-To: References: Message-ID: > On 3 Feb 2021, at 08:48, Stefano Zampini wrote: > ... > Questions: > - Is there any straightforward way to apply PCBDDC for DG which I am > missing? > > I don't think so. I know Lawrence gave it some thoughts but never heard about a final solution about how to represent subdomain DG matrices via a MATIS. I think when we discussed this (two+ years ago?), I was just worried about what I was doing. But I never got any further along. I think Jed's suggestion sounds like it would work. Lawrence From bsmith at petsc.dev Wed Feb 3 11:01:16 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 3 Feb 2021 11:01:16 -0600 Subject: [petsc-users] Fortran initialization and XXXDestroy In-Reply-To: References: <61624598-BC78-4FEF-A423-88D672CE7788@gmail.com> <3D664B94-B4B5-4824-A65A-61DA0126A570@petsc.dev> Message-ID: <5BB4D9CF-D4A6-43D3-AC5C-FBB6F67149DC@petsc.dev> > On Feb 3, 2021, at 6:07 AM, Mark Adams wrote: > > Perhaps a custom Fortran interface that checks for FORTRAN_TYPE_INITIALIZE. > That was the first thing that I looked for. This would mean tons more custom interfaces which are a pain to write and often forgotten. I think someone should just try changing to 0. Note we can still have a check if not initialized and a check if initialized for users (not that they should use them much). Barry > > On Tue, Feb 2, 2021 at 11:31 PM Satish Balay > wrote: > I think the current code is to support: > > if (var not initialized): crate(var). > > And there might be some mapping of 0 to NULL or some other state (hence the choice of -2) > > Satish > > On Tue, 2 Feb 2021, Barry Smith wrote: > > > > > I cannot remember why I selected -2 as the initial value for PETSc objects in Fortran. Probably because it would ensure a dramatic crash if you used > > an object without initializing it from Fortran. > > > > It could be changing > > > > config/BuildSystem/config/compilersFortran.py: self.addDefine('FORTRAN_TYPE_INITIALIZE', ' = -2') > > > > to 0 would mean that if you called destroy on the object and never created it everything would be fine; so you would not need to use any special code to check. > > > > Would that be a better model? > > > > Barry > > > > > > > > > On Feb 2, 2021, at 2:34 PM, Mark Adams > wrote: > > > > > > Thanks Randy, that makes sense. > > > Mark > > > > > > On Tue, Feb 2, 2021 at 3:27 PM Randall Mackie >> wrote: > > > Hi Mark, > > > > > > I don?t know what the XGC code is, but the way I do this in my Fortran code is that I initialize all objects I later want to destroy, for example: > > > > > > mat11=PETSC_NULL_MAT > > > vec1=PETSC_NULL_VEC > > > > > > etc > > > > > > Then I check and destroy like: > > > > > > if (mat11 /= PETSC_NULL_MAT) call MatDestroy(mat11, ierr) > > > > > > etc. > > > > > > Hope this helps, > > > > > > Randy > > > > > > > > > > On Feb 2, 2021, at 12:17 PM, Mark Adams >> wrote: > > > > > > > > Satish, a few years ago you helped us transition the XGC Fortran code from v3.7.7 and we seemed to have regressed. > > > > > > > > As I recall we removed the initialization of Mats (for example) in XGC. PETSc seems to initialize them with -2 in Fortran (Albert, cc'ed, verified this today) and I recall that from our previous conversation. As I look at the code now Fortran MatDestroy just goes straight to C, which would explain our crashes when we MatDestroy an uninitialized (-2) Mat. > > > > > > > > What is the correct way to delete with initializing Fortran objects? > > > > > > > > Thanks, > > > > Mark > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skavou1 at lsu.edu Wed Feb 3 11:07:34 2021 From: skavou1 at lsu.edu (Sepideh Kavousi) Date: Wed, 3 Feb 2021 17:07:34 +0000 Subject: [petsc-users] SNES-norm is zero all the time Message-ID: Hello, I have a very stupid problem that I am really ashamed of asking but it has been with me for days and I do not know what to do. I want to solve the Javier stokes equation with finite difference method. When I wanted to run the code, the snes norm value does not change and i get an error for timestep convergence. I thought I might do something wrong so, I tried to simplify the equation I want to solve to an easy form, given as: u_t=u_x +u_y Where _t is the time derivative and _x and _y are the derivative in x and y direction. When I want to solve this problem, it still does not do anything at all and the snes function norm is zero all the time. I know I am missing something but does anyone have any idea what should I check in my code. The answers does not change with time. Best, Sepideh Get Outlook for iOS -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Wed Feb 3 11:11:33 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 3 Feb 2021 20:11:33 +0300 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: References: Message-ID: Cannot say anything if you don't provide a minimal code we can run to reproduce the issue Il giorno mer 3 feb 2021 alle ore 20:07 Sepideh Kavousi ha scritto: > Hello, > I have a very stupid problem that I am really ashamed of asking but it has > been with me for days and I do not know what to do. I want to solve the > Javier stokes equation with finite difference method. When I wanted to run > the code, the snes norm value does not change and i get an error for > timestep convergence. I thought I might do something wrong so, I tried > to simplify the equation I want to solve to an easy form, given as: > u_t=u_x +u_y > Where _t is the time derivative and _x and _y are the derivative in x and > y direction. When I want to solve this problem, it still does not do > anything at all and the snes function norm is zero all the time. I know I > am missing something but does anyone have any idea what should I check in > my code. The answers does not change with time. > Best, > Sepideh > > > Get Outlook for iOS > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Wed Feb 3 11:12:36 2021 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Wed, 3 Feb 2021 18:12:36 +0100 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: References: Message-ID: Are you working from a particular example, or writing your own code from scratch? > Am 03.02.2021 um 18:11 schrieb Stefano Zampini : > > Cannot say anything if you don't provide a minimal code we can run to reproduce the issue > > Il giorno mer 3 feb 2021 alle ore 20:07 Sepideh Kavousi > ha scritto: > Hello, > I have a very stupid problem that I am really ashamed of asking but it has been with me for days and I do not know what to do. I want to solve the Javier stokes equation with finite difference method. When I wanted to run the code, the snes norm value does not change and i get an error for timestep convergence. I thought I might do something wrong so, I tried to simplify the equation I want to solve to an easy form, given as: > u_t=u_x +u_y > Where _t is the time derivative and _x and _y are the derivative in x and y direction. When I want to solve this problem, it still does not do anything at all and the snes function norm is zero all the time. I know I am missing something but does anyone have any idea what should I check in my code. The answers does not change with time. > Best, > Sepideh > > > Get Outlook for iOS > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Feb 3 11:19:54 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 03 Feb 2021 10:19:54 -0700 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: References: Message-ID: <87h7mtukzp.fsf@jedbrown.org> Sepideh Kavousi writes: > Hello, > I have a very stupid problem that I am really ashamed of asking but it has been with me for days and I do not know what to do. I want to solve the Javier stokes equation with finite difference method. Could you run with -snes_monitor -snes_converged_reason -pc_type lu? > When I wanted to run the code, the snes norm value does not change and i get an error for timestep convergence. I thought I might do something wrong so, I tried to simplify the equation I want to solve to an easy form, given as: u_t=u_x +u_y Where _t is the time derivative and _x and _y are the derivative in x and y direction. When I want to solve this problem, it still does not do anything at all and the snes function norm is zero all the time. I know I am missing something but does anyone have any idea what should I check in my code. The answers does not change with time. Best, Sepideh > > > Get Outlook for iOS From balay at mcs.anl.gov Wed Feb 3 11:34:49 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 3 Feb 2021 11:34:49 -0600 Subject: [petsc-users] petsc-3.14.4 now available Message-ID: Dear PETSc users, The patch release petsc-3.14.4 is now available for download. http://www.mcs.anl.gov/petsc/download/index.html Satish From skavou1 at lsu.edu Wed Feb 3 11:39:26 2021 From: skavou1 at lsu.edu (Sepideh Kavousi) Date: Wed, 3 Feb 2021 17:39:26 +0000 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: <87h7mtukzp.fsf@jedbrown.org> References: , <87h7mtukzp.fsf@jedbrown.org> Message-ID: I am not running an specific example. Attached is my code. and when I wun with ./step5.out -snes_monitor -snes_fd_color -ts_monitor -snes_converged_reason -pc_type lu it seems it does not solve anything because the output is like: 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 1 TS dt 0.005 time 0.005 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 2 TS dt 0.005 time 0.01 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 3 TS dt 0.005 time 0.015 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 4 TS dt 0.005 time 0.02 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 5 TS dt 0.005 time 0.025 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 6 TS dt 0.005 time 0.03 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 7 TS dt 0.005 time 0.035 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 8 TS dt 0.005 time 0.04 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 9 TS dt 0.005 time 0.045 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 10 TS dt 0.005 time 0.05 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 11 TS dt 0.005 time 0.055 copy! 0 SNES Function norm 0.000000000000e+00 Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 ....... ________________________________ From: Jed Brown Sent: Wednesday, February 3, 2021 11:19 AM To: Sepideh Kavousi ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] SNES-norm is zero all the time Sepideh Kavousi writes: > Hello, > I have a very stupid problem that I am really ashamed of asking but it has been with me for days and I do not know what to do. I want to solve the Javier stokes equation with finite difference method. Could you run with -snes_monitor -snes_converged_reason -pc_type lu? > When I wanted to run the code, the snes norm value does not change and i get an error for timestep convergence. I thought I might do something wrong so, I tried to simplify the equation I want to solve to an easy form, given as: u_t=u_x +u_y Where _t is the time derivative and _x and _y are the derivative in x and y direction. When I want to solve this problem, it still does not do anything at all and the snes function norm is zero all the time. I know I am missing something but does anyone have any idea what should I check in my code. The answers does not change with time. Best, Sepideh > > > Get Outlook for iOS -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: common-step5.c Type: text/x-csrc Size: 584 bytes Desc: common-step5.c URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: common-step5.h Type: text/x-chdr Size: 742 bytes Desc: common-step5.h URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makefile Type: application/octet-stream Size: 564 bytes Desc: makefile URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: step5.c Type: text/x-csrc Size: 7292 bytes Desc: step5.c URL: From knepley at gmail.com Wed Feb 3 11:46:24 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 3 Feb 2021 12:46:24 -0500 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: References: <87h7mtukzp.fsf@jedbrown.org> Message-ID: On Wed, Feb 3, 2021 at 12:39 PM Sepideh Kavousi wrote: > I am not running an specific example. Attached is my code. and when I wun > with > ./step5.out -snes_monitor -snes_fd_color -ts_monitor > -snes_converged_reason -pc_type lu > Did you mean [i][j] here? aF[j][j].vx=((vx_x+vx_y)-1*aYdot[j][i].vx*(1)); Matt > it seems it does not solve anything because the output is like: > > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 1 TS dt 0.005 time 0.005 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 2 TS dt 0.005 time 0.01 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 3 TS dt 0.005 time 0.015 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 4 TS dt 0.005 time 0.02 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 5 TS dt 0.005 time 0.025 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 6 TS dt 0.005 time 0.03 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 7 TS dt 0.005 time 0.035 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 8 TS dt 0.005 time 0.04 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 9 TS dt 0.005 time 0.045 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 10 TS dt 0.005 time 0.05 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 11 TS dt 0.005 time 0.055 > copy! > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > ....... > ------------------------------ > *From:* Jed Brown > *Sent:* Wednesday, February 3, 2021 11:19 AM > *To:* Sepideh Kavousi ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] SNES-norm is zero all the time > > Sepideh Kavousi writes: > > > Hello, > > I have a very stupid problem that I am really ashamed of asking but it > has been with me for days and I do not know what to do. I want to solve the > Javier stokes equation with finite difference method. > > Could you run with -snes_monitor -snes_converged_reason -pc_type lu? > > > When I wanted to run the code, the snes norm value does not change and i > get an error for timestep convergence. I thought I might do something wrong > so, I tried to simplify the equation I want to solve to an easy form, given > as: u_t=u_x +u_y Where _t is the time derivative and _x and _y are the > derivative in x and y direction. When I want to solve this problem, it > still does not do anything at all and the snes function norm is zero all > the time. I know I am missing something but does anyone have any idea what > should I check in my code. The answers does not change with time. Best, > Sepideh > > > > > > Get Outlook for iOS< > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fo0ukef&data=04%7C01%7Cskavou1%40lsu.edu%7C07c0cc86cc4d4100e28a08d8c8680396%7C2d4dad3f50ae47d983a09ae2b1f466f8%7C0%7C0%7C637479696353544458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=A7xZE31UylQvnMnTFIcZj60GToFHW%2FsumLS9kIISEWo%3D&reserved=0 > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Feb 3 11:46:35 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 03 Feb 2021 10:46:35 -0700 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: References: <87h7mtukzp.fsf@jedbrown.org> Message-ID: <87eehxujr8.fsf@jedbrown.org> Sepideh Kavousi writes: > I am not running an specific example. Attached is my code. and when I wun with > ./step5.out -snes_monitor -snes_fd_color -ts_monitor -snes_converged_reason -pc_type lu > > it seems it does not solve anything because the output is like: > > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 1 TS dt 0.005 time 0.005 Check your FormFunction for why af[][] is zero. I see aF[j][j].vx but you'll need to set at all the grid points, i.e., aF[j][i].vx and aF[j][i].vy. From skavou1 at lsu.edu Wed Feb 3 12:02:45 2021 From: skavou1 at lsu.edu (Sepideh Kavousi) Date: Wed, 3 Feb 2021 18:02:45 +0000 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: <87eehxujr8.fsf@jedbrown.org> References: <87h7mtukzp.fsf@jedbrown.org> , <87eehxujr8.fsf@jedbrown.org> Message-ID: I only have one field "names vx" and this variable change in both x and y directions. I have also chosen dof in DMDACreate2d to "1". I am not sure why I should have aF[i][j].vx. "i" defines the grids in x direction and "j" is in y-directions. In all my previous codes I define" aF[j][i].vx" and not "aF[i][j].vx", and it was working properly. Best, Sepideh ________________________________ From: Jed Brown Sent: Wednesday, February 3, 2021 11:46 AM To: Sepideh Kavousi ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] SNES-norm is zero all the time Sepideh Kavousi writes: > I am not running an specific example. Attached is my code. and when I wun with > ./step5.out -snes_monitor -snes_fd_color -ts_monitor -snes_converged_reason -pc_type lu > > it seems it does not solve anything because the output is like: > > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 1 TS dt 0.005 time 0.005 Check your FormFunction for why af[][] is zero. I see aF[j][j].vx but you'll need to set at all the grid points, i.e., aF[j][i].vx and aF[j][i].vy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 3 12:14:09 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 3 Feb 2021 13:14:09 -0500 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: References: <87h7mtukzp.fsf@jedbrown.org> <87eehxujr8.fsf@jedbrown.org> Message-ID: On Wed, Feb 3, 2021 at 1:03 PM Sepideh Kavousi wrote: > I only have one field "names vx" and this variable change in both x and y > directions. I have also chosen dof in DMDACreate2d to "1". > > I am not sure why I should have aF[i][j].vx. "i" defines the grids in x > direction and "j" is in y-directions. In all my previous codes I define" > aF[j][i].vx" and not "aF[i][j].vx", and it was working properly. > To me, it looks like you have "jj" Matt > Best, > Sepideh > ------------------------------ > *From:* Jed Brown > *Sent:* Wednesday, February 3, 2021 11:46 AM > *To:* Sepideh Kavousi ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] SNES-norm is zero all the time > > Sepideh Kavousi writes: > > > I am not running an specific example. Attached is my code. and when I > wun with > > ./step5.out -snes_monitor -snes_fd_color -ts_monitor > -snes_converged_reason -pc_type lu > > > > it seems it does not solve anything because the output is like: > > > > 0 SNES Function norm 0.000000000000e+00 > > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > > 1 TS dt 0.005 time 0.005 > > Check your FormFunction for why af[][] is zero. I see > > aF[j][j].vx > > but you'll need to set at all the grid points, i.e., aF[j][i].vx and > aF[j][i].vy. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From skavou1 at lsu.edu Wed Feb 3 12:28:03 2021 From: skavou1 at lsu.edu (Sepideh Kavousi) Date: Wed, 3 Feb 2021 18:28:03 +0000 Subject: [petsc-users] SNES-norm is zero all the time In-Reply-To: References: <87h7mtukzp.fsf@jedbrown.org> <87eehxujr8.fsf@jedbrown.org> , Message-ID: Oh my god, it was a bad one. Thanks for helping. Sepideh ________________________________ From: Matthew Knepley Sent: Wednesday, February 3, 2021 12:14 PM To: Sepideh Kavousi Cc: Jed Brown ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] SNES-norm is zero all the time On Wed, Feb 3, 2021 at 1:03 PM Sepideh Kavousi > wrote: I only have one field "names vx" and this variable change in both x and y directions. I have also chosen dof in DMDACreate2d to "1". I am not sure why I should have aF[i][j].vx. "i" defines the grids in x direction and "j" is in y-directions. In all my previous codes I define" aF[j][i].vx" and not "aF[i][j].vx", and it was working properly. To me, it looks like you have "jj" Matt Best, Sepideh ________________________________ From: Jed Brown > Sent: Wednesday, February 3, 2021 11:46 AM To: Sepideh Kavousi >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] SNES-norm is zero all the time Sepideh Kavousi > writes: > I am not running an specific example. Attached is my code. and when I wun with > ./step5.out -snes_monitor -snes_fd_color -ts_monitor -snes_converged_reason -pc_type lu > > it seems it does not solve anything because the output is like: > > 0 SNES Function norm 0.000000000000e+00 > Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0 > 1 TS dt 0.005 time 0.005 Check your FormFunction for why af[][] is zero. I see aF[j][j].vx but you'll need to set at all the grid points, i.e., aF[j][i].vx and aF[j][i].vy. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano.siqueira at usp.br Wed Feb 3 13:41:10 2021 From: luciano.siqueira at usp.br (Luciano Siqueira) Date: Wed, 3 Feb 2021 16:41:10 -0300 Subject: [petsc-users] Slower performance in multi-node system Message-ID: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br> Hello, I'm evaluating the performance of an application in a distributed environment and I notice that it's much slower when running in many nodes/cores when compared to a single node with a fewer cores. When running the application in 20 nodes, the Main Stage time reported in PETSc's log is up to 10 times slower than it is when running the same application in only 1 node, even with fewer cores per node. The application I'm running is an example code provided by libmesh: http://libmesh.github.io/examples/introduction_ex4.html The application runs inside a Singularity container, with openmpi-4.0.3 and PETSc 3.14.3. The distributed processes are managed by slurm 17.02.11 and each node is equipped with two Intel CPU Xeon E5-2695v2 Ivy Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going through infiniband. My questions are: Is the slowdown expected? Should the application be specially tailored to work well in distributed environments? Also, where (maybe in PETSc documentation/source-code) can I find information on how PETSc handles MPI communications? Do the KSP solvers favor one-to-one process communication over broadcast messages or vice-versa? I suspect inter-process communication must be the cause of the poor performance when using many nodes, but not as much as I'm seeing. Thank you in advance! Luciano. From knepley at gmail.com Wed Feb 3 13:43:54 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 3 Feb 2021 14:43:54 -0500 Subject: [petsc-users] Slower performance in multi-node system In-Reply-To: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br> References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br> Message-ID: On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira wrote: > Hello, > > I'm evaluating the performance of an application in a distributed > environment and I notice that it's much slower when running in many > nodes/cores when compared to a single node with a fewer cores. > > When running the application in 20 nodes, the Main Stage time reported > in PETSc's log is up to 10 times slower than it is when running the same > application in only 1 node, even with fewer cores per node. > > The application I'm running is an example code provided by libmesh: > > http://libmesh.github.io/examples/introduction_ex4.html > > The application runs inside a Singularity container, with openmpi-4.0.3 > and PETSc 3.14.3. The distributed processes are managed by slurm > 17.02.11 and each node is equipped with two Intel CPU Xeon E5-2695v2 Ivy > Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going through > infiniband. > > My questions are: Is the slowdown expected? Should the application be > specially tailored to work well in distributed environments? > > Also, where (maybe in PETSc documentation/source-code) can I find > information on how PETSc handles MPI communications? Do the KSP solvers > favor one-to-one process communication over broadcast messages or > vice-versa? I suspect inter-process communication must be the cause of > the poor performance when using many nodes, but not as much as I'm seeing. > > Thank you in advance! > We can't say anything about the performance without some data. Please send us the output of -log_view for both cases. Thanks, Matt > Luciano. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano.siqueira at usp.br Wed Feb 3 14:40:04 2021 From: luciano.siqueira at usp.br (Luciano Siqueira) Date: Wed, 3 Feb 2021 17:40:04 -0300 Subject: [petsc-users] Slower performance in multi-node system In-Reply-To: References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br> Message-ID: Here are the (attached) output of -log_view for both cases. The beginning of the files has some info from the libmesh app. Running in 1 node, 32 cores: 01_node_log_view.txt Running in 20 nodes, 32 cores each (640 cores in total): 01_node_log_view.txt Thanks! Luciano. Em 03/02/2021 16:43, Matthew Knepley escreveu: > On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira > > wrote: > > Hello, > > I'm evaluating the performance of an application in a distributed > environment and I notice that it's much slower when running in many > nodes/cores when compared to a single node with a fewer cores. > > When running the application in 20 nodes, the Main Stage time > reported > in PETSc's log is up to 10 times slower than it is when running > the same > application in only 1 node, even with fewer cores per node. > > The application I'm running is an example code provided by libmesh: > > http://libmesh.github.io/examples/introduction_ex4.html > > The application runs inside a Singularity container, with > openmpi-4.0.3 > and PETSc 3.14.3. The distributed processes are managed by slurm > 17.02.11 and each node is equipped with two Intel CPU Xeon > E5-2695v2 Ivy > Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going > through > infiniband. > > My questions are: Is the slowdown expected? Should the application be > specially tailored to work well in distributed environments? > > Also, where (maybe in PETSc documentation/source-code) can I find > information on how PETSc handles MPI communications? Do the KSP > solvers > favor one-to-one process communication over broadcast messages or > vice-versa? I suspect inter-process communication must be the > cause of > the poor performance when using many nodes, but not as much as I'm > seeing. > > Thank you in advance! > > > We can't say anything about the performance without some data. Please > send us the output > of -log_view for both cases. > > ? Thanks, > > ? ? ?Matt > > Luciano. > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Running ./experiment -d 3 -n 31 -mat_type aij -ksp_type gmres -pc_type bjacobi -log_view Mesh Information: elem_dimensions()={3} spatial_dimension()=3 n_nodes()=250047 n_local_nodes()=8955 n_elem()=29791 n_local_elem()=935 n_active_elem()=29791 n_subdomains()=1 n_partitions()=32 n_processors()=32 n_threads()=1 processor_id()=0 *** Warning, This code is untested, experimental, or likely to see future API changes: ./include/libmesh/mesh_base.h, line 1667, compiled Jan 12 2021 at 12:34:39 *** EquationSystems n_systems()=1 System #0, "Poisson" Type "LinearImplicit" Variables="u" Finite Element Types="LAGRANGE" Approximation Orders="SECOND" n_dofs()=250047 n_local_dofs()=8955 n_constrained_dofs()=23066 n_local_constrained_dofs()=636 n_vectors()=1 n_matrices()=1 DofMap Sparsity Average On-Processor Bandwidth <= 56.5003 Average Off-Processor Bandwidth <= 7.21882 Maximum On-Processor Bandwidth <= 136 Maximum Off-Processor Bandwidth <= 140 DofMap Constraints Number of DoF Constraints = 23066 Number of Heterogenous Constraints= 22818 Average DoF Constraint Length= 0 Mesh Information: elem_dimensions()={3} spatial_dimension()=3 n_nodes()=250047 n_local_nodes()=8955 n_elem()=29791 n_local_elem()=935 n_active_elem()=29791 n_subdomains()=1 n_partitions()=32 n_processors()=32 n_threads()=1 processor_id()=0 ----------------------------------------------------- | Processor id: 0 | | Num Processors: 32 | | Time: Wed Feb 3 17:26:38 2021 | | OS: Linux | | HostName: sdumont6197 | | OS Release: 3.10.0-957.el7.x86_64 | | OS Version: #1 SMP Thu Oct 4 20:48:51 UTC 2018 | | Machine: x86_64 | | Username: luciano.siqueira | | Configuration: ../configure '--prefix=/usr/local' | | '--with-vtk-include=/usr/local/include/vtk-8.2' | | '--with-vtk-lib=/usr/local/lib' | | '--enable-petsc=yes' | | '--enable-petsc-required' | | '--enable-slepc' | | '--enable-slepc-required' | | 'METHODS=opt' | | 'PETSC_DIR=/opt/petsc' | | 'PETSC_ARCH=arch-linux2-c-opt' | | 'SLEPC_DIR=/opt/petsc/arch-linux2-c-opt' | ----------------------------------------------------- ------------------------------------------------------------------------------------------------------------ | Matrix Assembly Performance: Alive time=0.158664, Active time=0.068175 | ------------------------------------------------------------------------------------------------------------ | Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time | | w/o Sub w/o Sub With Sub With Sub w/o S With S | |------------------------------------------------------------------------------------------------------------| | | | Fe 935 0.0084 0.000009 0.0084 0.000009 12.35 12.35 | | Ke 935 0.0395 0.000042 0.0395 0.000042 57.88 57.88 | | elem init 935 0.0203 0.000022 0.0203 0.000022 29.76 29.76 | ------------------------------------------------------------------------------------------------------------ | Totals: 2805 0.0682 100.00 | ------------------------------------------------------------------------------------------------------------ ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./experiment on a arch-linux2-c-opt named sdumont6197 with 32 processors, by luciano.siqueira Wed Feb 3 17:26:39 2021 Using 1 OpenMP threads Using Petsc Development GIT revision: v3.14.3-435-gd1574ab4cd GIT Date: 2021-01-11 15:13:43 +0000 Max Max/Min Avg Total Time (sec): 2.792e+00 1.000 2.791e+00 Objects: 6.600e+01 1.000 6.600e+01 Flop: 5.609e+08 1.478 4.731e+08 1.514e+10 Flop/sec: 2.009e+08 1.478 1.695e+08 5.424e+09 MPI Messages: 3.178e+03 3.446 1.835e+03 5.872e+04 MPI Message Lengths: 1.138e+07 1.910 4.579e+03 2.689e+08 MPI Reductions: 4.340e+02 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 2.7915e+00 100.0% 1.5140e+10 100.0% 5.872e+04 100.0% 4.579e+03 100.0% 4.270e+02 98.4% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 7 1.0 9.5138e-02 6.8 0.00e+00 0.0 8.1e+02 5.3e+00 7.0e+00 1 0 1 0 2 1 0 1 0 2 0 BuildTwoSidedF 5 1.0 8.6365e-02162.3 0.00e+00 0.0 6.6e+02 3.6e+04 5.0e+00 1 0 1 9 1 1 0 1 9 1 0 MatMult 198 1.0 4.5126e-01 1.2 2.27e+08 1.5 5.4e+04 3.1e+03 1.0e+00 14 40 92 63 0 14 40 92 63 0 13438 MatSolve 199 1.0 3.5615e-01 1.5 2.00e+08 1.6 0.0e+00 0.0e+00 0.0e+00 11 36 0 0 0 11 36 0 0 0 15134 MatLUFactorNum 1 1.0 5.2785e-02 1.5 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 10833 MatILUFactorSym 1 1.0 8.4187e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 2 1.0 9.9590e-02 6.6 0.00e+00 0.0 2.7e+02 8.8e+04 2.0e+00 1 0 0 9 0 1 0 0 9 0 0 MatAssemblyEnd 2 1.0 2.3421e-02 1.0 3.94e+04 0.0 0.0e+00 0.0e+00 4.0e+00 1 0 0 0 1 1 0 0 0 1 21 MatGetRowIJ 1 1.0 1.8900e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.2824e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 3 1.0 3.9527e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 191 1.0 1.5713e-01 6.0 5.11e+07 1.4 0.0e+00 0.0e+00 1.9e+02 4 9 0 0 44 4 9 0 0 45 9089 VecNorm 199 1.0 2.8923e-02 3.6 3.56e+06 1.4 0.0e+00 0.0e+00 2.0e+02 1 1 0 0 46 1 1 0 0 47 3441 VecScale 198 1.0 9.2115e-04 1.3 1.77e+06 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 53747 VecCopy 8 1.0 1.1197e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 211 1.0 2.1250e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 14 1.0 1.2501e-0312.0 2.51e+05 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5601 VecMAXPY 198 1.0 3.5915e-02 1.4 5.46e+07 1.4 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 42427 VecAssemblyBegin 3 1.0 6.9581e-04 1.1 0.00e+00 0.0 4.0e+02 9.8e+02 3.0e+00 0 0 1 0 1 0 0 1 0 1 0 VecAssemblyEnd 3 1.0 9.0871e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 199 1.0 3.8270e-02 2.6 0.00e+00 0.0 5.5e+04 3.1e+03 2.0e+00 1 0 93 64 0 1 0 93 64 0 0 VecScatterEnd 199 1.0 1.0005e-0112.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecNormalize 198 1.0 2.8637e-02 3.4 5.32e+06 1.4 0.0e+00 0.0e+00 2.0e+02 1 1 0 0 46 1 1 0 0 46 5187 SFSetGraph 2 1.0 4.9453e-05 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 2 1.0 2.3974e-02 5.2 0.00e+00 0.0 1.1e+03 9.4e+02 2.0e+00 1 0 2 0 0 1 0 2 0 0 0 SFPack 199 1.0 7.6794e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 199 1.0 1.1253e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 2 1.0 8.8754e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 8.9594e-01 1.0 5.38e+08 1.5 5.4e+04 3.1e+03 3.9e+02 32 96 92 63 90 32 96 92 63 92 16253 KSPGMRESOrthog 191 1.0 1.8209e-01 3.0 1.02e+08 1.4 0.0e+00 0.0e+00 1.9e+02 5 19 0 0 44 5 19 0 0 45 15687 PCSetUp 2 1.0 8.7059e-02 1.3 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 6568 PCSetUpOnBlocks 1 1.0 6.1107e-02 1.5 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 9358 PCApply 199 1.0 3.5986e-01 1.5 2.00e+08 1.6 0.0e+00 0.0e+00 0.0e+00 11 36 0 0 0 11 36 0 0 0 14978 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Distributed Mesh 1 1 5048 0. Matrix 4 4 14023836 0. Index Set 7 7 152940 0. IS L to G Mapping 1 1 53380 0. Vector 43 43 2874792 0. Star Forest Graph 4 4 4576 0. Krylov Solver 2 2 20184 0. Preconditioner 2 2 1944 0. Discrete System 1 1 960 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 1.507e-07 Average time for MPI_Barrier(): 9.6098e-06 Average time for zero size MPI_Send(): 4.25837e-06 #PETSc Option Table entries: -d 3 -ksp_type gmres -log_view -mat_type aij -n 31 -pc_type bjacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-debugging=no --with-openmp=1 --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-spai --download-parms --download-slepc --download-openmpi=yes COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= ----------------------------------------- Libraries compiled on 2021-01-12 11:28:56 on libmesh-cpu Machine characteristics: Linux-5.4.0-60-generic-x86_64-with-debian-10.7 Using PETSc directory: /opt/petsc Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: /opt/petsc/arch-linux2-c-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fopenmp Using Fortran compiler: /opt/petsc/arch-linux2-c-opt/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fopenmp ----------------------------------------- Using include paths: -I/opt/petsc/include -I/opt/petsc/arch-linux2-c-opt/include ----------------------------------------- Using C linker: /opt/petsc/arch-linux2-c-opt/bin/mpicc Using Fortran linker: /opt/petsc/arch-linux2-c-opt/bin/mpif90 Using libraries: -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lparms -lspai -llapack -lblas -lX11 -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl ----------------------------------------- -------------- next part -------------- Running ./experiment -d 3 -n 31 -mat_type aij -ksp_type gmres -pc_type bjacobi -log_view Mesh Information: elem_dimensions()={3} spatial_dimension()=3 n_nodes()=250047 n_local_nodes()=569 n_elem()=29791 n_local_elem()=47 n_active_elem()=29791 n_subdomains()=1 n_partitions()=640 n_processors()=640 n_threads()=1 processor_id()=0 *** Warning, This code is untested, experimental, or likely to see future API changes: ./include/libmesh/mesh_base.h, line 1667, compiled Jan 12 2021 at 12:34:39 *** EquationSystems n_systems()=1 System #0, "Poisson" Type "LinearImplicit" Variables="u" Finite Element Types="LAGRANGE" Approximation Orders="SECOND" n_dofs()=250047 n_local_dofs()=569 n_constrained_dofs()=23066 n_local_constrained_dofs()=149 n_vectors()=1 n_matrices()=1 DofMap Sparsity Average On-Processor Bandwidth <= 44.9841 Average Off-Processor Bandwidth <= 23.7024 Maximum On-Processor Bandwidth <= 145 Maximum Off-Processor Bandwidth <= 158 DofMap Constraints Number of DoF Constraints = 23066 Number of Heterogenous Constraints= 22818 Average DoF Constraint Length= 0 Mesh Information: elem_dimensions()={3} spatial_dimension()=3 n_nodes()=250047 n_local_nodes()=569 n_elem()=29791 n_local_elem()=47 n_active_elem()=29791 n_subdomains()=1 n_partitions()=640 n_processors()=640 n_threads()=1 processor_id()=0 ----------------------------------------------------- | Processor id: 0 | | Num Processors: 640 | | Time: Mon Feb 1 18:53:04 2021 | | OS: Linux | | HostName: sdumont6170 | | OS Release: 3.10.0-957.el7.x86_64 | | OS Version: #1 SMP Thu Oct 4 20:48:51 UTC 2018 | | Machine: x86_64 | | Username: luciano.siqueira | | Configuration: ../configure '--prefix=/usr/local' | | '--with-vtk-include=/usr/local/include/vtk-8.2' | | '--with-vtk-lib=/usr/local/lib' | | '--enable-petsc=yes' | | '--enable-petsc-required' | | '--enable-slepc' | | '--enable-slepc-required' | | 'METHODS=opt' | | 'PETSC_DIR=/opt/petsc' | | 'PETSC_ARCH=arch-linux2-c-opt' | | 'SLEPC_DIR=/opt/petsc/arch-linux2-c-opt' | ----------------------------------------------------- ------------------------------------------------------------------------------------------------------------ | Matrix Assembly Performance: Alive time=0.056831, Active time=0.006895 | ------------------------------------------------------------------------------------------------------------ | Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time | | w/o Sub w/o Sub With Sub With Sub w/o S With S | |------------------------------------------------------------------------------------------------------------| | | | Fe 47 0.0004 0.000009 0.0004 0.000009 6.42 6.42 | | Ke 47 0.0020 0.000042 0.0020 0.000042 28.83 28.83 | | elem init 47 0.0045 0.000095 0.0045 0.000095 64.74 64.74 | ------------------------------------------------------------------------------------------------------------ | Totals: 141 0.0069 100.00 | ------------------------------------------------------------------------------------------------------------ ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./experiment on a arch-linux2-c-opt named sdumont6170 with 640 processors, by luciano.siqueira Mon Feb 1 18:53:07 2021 Using 1 OpenMP threads Using Petsc Development GIT revision: v3.14.3-435-gd1574ab4cd GIT Date: 2021-01-11 15:13:43 +0000 Max Max/Min Avg Total Time (sec): 1.968e+02 1.000 1.968e+02 Objects: 6.600e+01 1.000 6.600e+01 Flop: 4.131e+07 4.553 2.385e+07 1.526e+10 Flop/sec: 2.099e+05 4.553 1.212e+05 7.756e+07 MPI Messages: 8.425e+03 2.949 5.414e+03 3.465e+06 MPI Message Lengths: 5.026e+06 1.669 7.080e+02 2.453e+09 MPI Reductions: 4.890e+02 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.9678e+02 100.0% 1.5262e+10 100.0% 3.465e+06 100.0% 7.080e+02 100.0% 4.820e+02 98.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 7 1.0 2.8366e-01 5.9 0.00e+00 0.0 2.7e+04 5.2e+00 7.0e+00 0 0 1 0 1 0 0 1 0 1 0 BuildTwoSidedF 5 1.0 2.5666e-01 9.5 0.00e+00 0.0 2.0e+04 4.0e+03 5.0e+00 0 0 1 3 1 0 0 1 3 1 0 MatMult 226 1.0 6.4752e-01 3.1 1.87e+07 4.2 2.2e+06 3.9e+02 1.0e+00 0 45 63 35 0 0 45 63 35 0 10689 MatSolve 227 1.0 2.6350e-02 6.7 1.28e+07 7.0 0.0e+00 0.0e+00 0.0e+00 0 29 0 0 0 0 29 0 0 0 168471 MatLUFactorNum 1 1.0 3.0115e-0310.8 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 106202 MatILUFactorSym 1 1.0 1.7141e-0219.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 2 1.0 2.3641e-0120.6 0.00e+00 0.0 8.1e+03 9.7e+03 2.0e+00 0 0 0 3 0 0 0 0 3 0 0 MatAssemblyEnd 2 1.0 4.7616e-01 1.8 8.30e+03 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 4 MatGetRowIJ 1 1.0 2.4430e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 4.3698e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 3 1.0 2.4557e-04 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecMDot 218 1.0 1.2456e+00 1.7 3.98e+06 3.2 0.0e+00 0.0e+00 2.2e+02 0 11 0 0 45 0 11 0 0 45 1320 VecNorm 227 1.0 1.3911e+00 1.6 2.75e+05 3.2 0.0e+00 0.0e+00 2.3e+02 1 1 0 0 46 1 1 0 0 47 82 VecScale 226 1.0 7.1863e-04 2.2 1.37e+05 3.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 78637 VecCopy 9 1.0 2.7855e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 240 1.0 3.9133e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 16 1.0 6.4706e-03293.5 1.94e+04 3.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1237 VecMAXPY 226 1.0 3.9906e-03 2.6 4.25e+06 3.2 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 439741 VecAssemblyBegin 3 1.0 2.2735e-02 1.4 0.00e+00 0.0 1.2e+04 1.2e+02 3.0e+00 0 0 0 0 1 0 0 0 0 1 0 VecAssemblyEnd 3 1.0 2.9396e-03277.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 227 1.0 6.0738e-02 1.8 0.00e+00 0.0 2.2e+06 3.9e+02 2.0e+00 0 0 64 35 0 0 0 64 35 0 0 VecScatterEnd 227 1.0 5.8930e-01 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 226 1.0 1.3851e+00 1.6 4.10e+05 3.2 0.0e+00 0.0e+00 2.3e+02 1 1 0 0 46 1 1 0 0 47 122 SFSetGraph 2 1.0 2.1940e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 2 1.0 4.2313e-02 1.7 0.00e+00 0.0 3.8e+04 1.1e+02 2.0e+00 0 0 1 0 0 0 0 1 0 0 0 SFPack 227 1.0 1.7886e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 227 1.0 2.3074e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 2 1.0 6.7246e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.5118e+00 1.0 4.02e+07 4.5 2.2e+06 3.9e+02 4.5e+02 1 98 63 35 91 1 98 63 35 93 5947 KSPGMRESOrthog 218 1.0 1.2489e+00 1.7 7.96e+06 3.2 0.0e+00 0.0e+00 2.2e+02 0 22 0 0 45 0 22 0 0 45 2634 PCSetUp 2 1.0 4.3814e-02 1.6 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 7300 PCSetUpOnBlocks 1 1.0 1.8862e-02 8.3 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 16956 PCApply 227 1.0 2.9083e-02 5.1 1.28e+07 7.0 0.0e+00 0.0e+00 0.0e+00 0 29 0 0 0 0 29 0 0 0 152639 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Distributed Mesh 1 1 5048 0. Matrix 4 4 842592 0. Index Set 7 7 18132 0. IS L to G Mapping 1 1 4972 0. Vector 43 43 257096 0. Star Forest Graph 4 4 4576 0. Krylov Solver 2 2 20184 0. Preconditioner 2 2 1944 0. Discrete System 1 1 960 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 3.933e-07 Average time for MPI_Barrier(): 0.00498015 Average time for zero size MPI_Send(): 0.000194207 #PETSc Option Table entries: -d 3 -ksp_type gmres -log_view -mat_type aij -n 31 -pc_type bjacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-debugging=no --with-openmp=1 --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-spai --download-parms --download-slepc --download-openmpi=yes COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS= ----------------------------------------- Libraries compiled on 2021-01-12 11:28:56 on libmesh-cpu Machine characteristics: Linux-5.4.0-60-generic-x86_64-with-debian-10.7 Using PETSc directory: /opt/petsc Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: /opt/petsc/arch-linux2-c-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fopenmp Using Fortran compiler: /opt/petsc/arch-linux2-c-opt/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fopenmp ----------------------------------------- Using include paths: -I/opt/petsc/include -I/opt/petsc/arch-linux2-c-opt/include ----------------------------------------- Using C linker: /opt/petsc/arch-linux2-c-opt/bin/mpicc Using Fortran linker: /opt/petsc/arch-linux2-c-opt/bin/mpif90 Using libraries: -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lparms -lspai -llapack -lblas -lX11 -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl ----------------------------------------- From bsmith at petsc.dev Wed Feb 3 22:37:21 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 3 Feb 2021 22:37:21 -0600 Subject: [petsc-users] Slower performance in multi-node system In-Reply-To: References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br> Message-ID: https://www.mcs.anl.gov/petsc/documentation/faq.html#computers In particular looking at the results of the parallel run I see Average time to get PetscTime(): 3.933e-07 Average time for MPI_Barrier(): 0.00498015 Average time for zero size MPI_Send(): 0.000194207 So the times for communication are huge. 4.9 milliseconds for a synchronization of twenty processes. A millisecond is an eternity for parallel computing. It is not clear to me that this system is appropriate for tightly couple parallel simulations. Barry > On Feb 3, 2021, at 2:40 PM, Luciano Siqueira wrote: > > Here are the (attached) output of -log_view for both cases. The beginning of the files has some info from the libmesh app. > > Running in 1 node, 32 cores: 01_node_log_view.txt > > Running in 20 nodes, 32 cores each (640 cores in total): 01_node_log_view.txt > > Thanks! > > Luciano. > > Em 03/02/2021 16:43, Matthew Knepley escreveu: >> On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira > wrote: >> Hello, >> >> I'm evaluating the performance of an application in a distributed >> environment and I notice that it's much slower when running in many >> nodes/cores when compared to a single node with a fewer cores. >> >> When running the application in 20 nodes, the Main Stage time reported >> in PETSc's log is up to 10 times slower than it is when running the same >> application in only 1 node, even with fewer cores per node. >> >> The application I'm running is an example code provided by libmesh: >> >> http://libmesh.github.io/examples/introduction_ex4.html >> >> The application runs inside a Singularity container, with openmpi-4.0.3 >> and PETSc 3.14.3. The distributed processes are managed by slurm >> 17.02.11 and each node is equipped with two Intel CPU Xeon E5-2695v2 Ivy >> Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going through >> infiniband. >> >> My questions are: Is the slowdown expected? Should the application be >> specially tailored to work well in distributed environments? >> >> Also, where (maybe in PETSc documentation/source-code) can I find >> information on how PETSc handles MPI communications? Do the KSP solvers >> favor one-to-one process communication over broadcast messages or >> vice-versa? I suspect inter-process communication must be the cause of >> the poor performance when using many nodes, but not as much as I'm seeing. >> >> Thank you in advance! >> >> We can't say anything about the performance without some data. Please send us the output >> of -log_view for both cases. >> >> Thanks, >> >> Matt >> >> Luciano. >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > <01_node_log_view.txt><20_node_log_view.txt> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Thu Feb 4 10:07:15 2021 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Thu, 4 Feb 2021 16:07:15 +0000 Subject: [petsc-users] Slower performance in multi-node system In-Reply-To: References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br> Message-ID: On , 2021Feb3, at 22:37, Barry Smith > wrote: https://www.mcs.anl.gov/petsc/documentation/faq.html#computers I happened to scroll up a line, and Any useful books on numerical computing? Writing Scientific Software: A Guide to Good Style Is a dead link. Feel free to link to my 3 textbooks: https://pages.tacc.utexas.edu/~eijkhout/istc/istc.html Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Thu Feb 4 11:05:15 2021 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Thu, 4 Feb 2021 17:05:15 +0000 Subject: [petsc-users] Slower performance in multi-node system In-Reply-To: References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br> Message-ID: On , 2021Feb3, at 22:37, Barry Smith > wrote: https://www.mcs.anl.gov/petsc/documentation/faq.html#computers ./process.py createfile ; process.py That script doesn?t work for python3. Also: second time without dot-slash? Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Thu Feb 4 12:27:16 2021 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Thu, 4 Feb 2021 19:27:16 +0100 Subject: [petsc-users] Slower performance in multi-node system In-Reply-To: References: <06d8b0d1-8879-4a7c-134e-d94dc5442ecc@usp.br> Message-ID: <73C62AB6-EAA0-4F0D-91B4-1201175B6680@gmail.com> That page has been ported to Sphinx in master (thanks, Jacob!), so if adding that link, it'd be helpful to do it here (Note you can click on the "edit on Gitlab" in the tab in the bottom right and make an MR, which is handy for little changes which you expect to get right in one attempt) https://docs.petsc.org/en/master/faq/#any-useful-books-on-numerical-computing (And Sphinx has a utility to check all external links, so we should be able to clean up this and any other dead links in one pass before the next release) > Am 04.02.2021 um 17:07 schrieb Victor Eijkhout : > > > >> On , 2021Feb3, at 22:37, Barry Smith > wrote: >> >> >> https://www.mcs.anl.gov/petsc/documentation/faq.html#computers > I happened to scroll up a line, and > > Any useful books on numerical computing? <>Writing Scientific Software: A Guide to Good Style > > Is a dead link. > > Feel free to link to my 3 textbooks: > > https://pages.tacc.utexas.edu/~eijkhout/istc/istc.html > > Victor. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.barral at math.u-bordeaux.fr Mon Feb 8 05:00:53 2021 From: nicolas.barral at math.u-bordeaux.fr (Nicolas Barral) Date: Mon, 8 Feb 2021 12:00:53 +0100 Subject: [petsc-users] DMPlex tetrahedra facets orientation Message-ID: Hi all, Can I make any assumption on the orientation of triangular facets in a tetrahedral plex ? I need the inward facet normals. Do I need to use DMPlexGetOrientedFace or can I rely on either the tet vertices ordering, or the faces ordering ? Could DMPlexGetRawFaces_Internal be enough ? Alternatively, is there a function that computes the normals - without bringing out the big guns ? Thanks -- Nicolas From e0425375 at gmail.com Mon Feb 8 06:03:59 2021 From: e0425375 at gmail.com (Florian Bruckner) Date: Mon, 8 Feb 2021 13:03:59 +0100 Subject: [petsc-users] using preconditioner with SLEPc Message-ID: Dear PETSc / SLEPc Users, my question is very similar to the one posted here: https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html The eigensystem I would like to solve looks like: B0 v = 1/omega A0 v B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: es = SLEPc.EPS().create(comm=fd.COMM_WORLD) st = es.getST() ksp = st.getKSP() ksp.setOperators(self.A0, self.P0) Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). By the way, should P0 be an approximation of A0 or does it have to include B0? Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? thanks for any advice best wishes Florian -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 8 07:22:42 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 8 Feb 2021 08:22:42 -0500 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner wrote: > Dear PETSc / SLEPc Users, > > my question is very similar to the one posted here: > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > > The eigensystem I would like to solve looks like: > B0 v = 1/omega A0 v > B0 and A0 are both hermitian, A0 is positive definite, but only given as a > linear operator (matshell). I am looking for the largest eigenvalues > (=smallest omega). > > I also have a sparse approximation P0 of the A0 operator, which i would > like to use as precondtioner, using something like this: > > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > st = es.getST() > ksp = st.getKSP() > ksp.setOperators(self.A0, self.P0) > > Unfortunately PETSc still complains that it cannot create a preconditioner > for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == > 'python'). > By the way, should P0 be an approximation of A0 or does it have to include > B0? > > Right now I am using the krylov-schur method. Are there any alternatives > if A0 is only given as an operator? > Jose can correct me if I say something wrong. When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and then handed that to EPS. You can see me do it here: https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. Thanks, Matt > thanks for any advice > best wishes > Florian > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 8 08:19:06 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 8 Feb 2021 09:19:06 -0500 Subject: [petsc-users] DMPlex tetrahedra facets orientation In-Reply-To: References: Message-ID: On Mon, Feb 8, 2021 at 6:01 AM Nicolas Barral < nicolas.barral at math.u-bordeaux.fr> wrote: > Hi all, > > Can I make any assumption on the orientation of triangular facets in a > tetrahedral plex ? I need the inward facet normals. Do I need to use > DMPlexGetOrientedFace or can I rely on either the tet vertices ordering, > or the faces ordering ? Could DMPlexGetRawFaces_Internal be enough ? > You can do it by hand, but you have to account for the face orientation relative to the cell. That is what DMPlexGetOrientedFace() does. I think it would be easier to use the function below. > Alternatively, is there a function that computes the normals - without > bringing out the big guns ? > This will compute the normals https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexComputeCellGeometryFVM.html Should not be too heavy weight. THanks, Matt Thanks > > -- > Nicolas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Feb 8 08:37:34 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 8 Feb 2021 15:37:34 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is A0^{-1}*B0*v=theta*v and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. If you are using the git repo, I could add the relevant code. Jose > El 8 feb 2021, a las 14:22, Matthew Knepley escribi?: > > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner wrote: > Dear PETSc / SLEPc Users, > > my question is very similar to the one posted here: > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > > The eigensystem I would like to solve looks like: > B0 v = 1/omega A0 v > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). > > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: > > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > st = es.getST() > ksp = st.getKSP() > ksp.setOperators(self.A0, self.P0) > > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). > By the way, should P0 be an approximation of A0 or does it have to include B0? > > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? > > Jose can correct me if I say something wrong. > > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and > then handed that to EPS. You can see me do it here: > > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. > > Thanks, > > Matt > > thanks for any advice > best wishes > Florian > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Mon Feb 8 08:48:52 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 8 Feb 2021 09:48:52 -0500 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: On Mon, Feb 8, 2021 at 9:37 AM Jose E. Roman wrote: > The problem can be written as A0*v=omega*B0*v and you want the eigenvalues > omega closest to zero. If the matrices were explicitly available, you would > do shift-and-invert with target=0, that is > > (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is > > A0^{-1}*B0*v=theta*v > > and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. > > Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of > EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? > EPS_SMALLEST_REAL will give slow convergence. > Thanks Jose! I am not understanding some step. I want the smallest eigenvalues. Should I use EPS_SMALLEST_MAGNITUDE? I appear to get what I want using SMALLEST_REAL, but as you say it might be slower than it has to be. Also, sometime I would like to talk about incorporating the multilevel eigensolver. I am sure you could make lots of improvements to my initial attempt. I will send you a separate email, since I am getting serious about testing it. Thanks, Matt > Florian: I would not recommend setting the KSP matrices directly, it may > produce strange side-effects. We should have an interface function to pass > this matrix. Currently there is STPrecondSetMatForPC() but it has two > problems: (1) it is intended for STPRECOND, so cannot be used with > Krylov-Schur, and (2) it is not currently available in the python interface. > > The approach used by Matt is a workaround that does not use ST, so you can > handle linear solves with a KSP of your own. > > As an alternative, since your problem is symmetric, you could try LOBPCG, > assuming that the leftmost eigenvalues are those that you want (e.g. if all > eigenvalues are non-negative). In that case you could use > STPrecondSetMatForPC(), but the remaining issue is calling it from python. > > If you are using the git repo, I could add the relevant code. > > Jose > > > > > El 8 feb 2021, a las 14:22, Matthew Knepley > escribi?: > > > > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner > wrote: > > Dear PETSc / SLEPc Users, > > > > my question is very similar to the one posted here: > > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > > > > The eigensystem I would like to solve looks like: > > B0 v = 1/omega A0 v > > B0 and A0 are both hermitian, A0 is positive definite, but only given as > a linear operator (matshell). I am looking for the largest eigenvalues > (=smallest omega). > > > > I also have a sparse approximation P0 of the A0 operator, which i would > like to use as precondtioner, using something like this: > > > > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > > st = es.getST() > > ksp = st.getKSP() > > ksp.setOperators(self.A0, self.P0) > > > > Unfortunately PETSc still complains that it cannot create a > preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but > A0.type == 'python'). > > By the way, should P0 be an approximation of A0 or does it have to > include B0? > > > > Right now I am using the krylov-schur method. Are there any alternatives > if A0 is only given as an operator? > > > > Jose can correct me if I say something wrong. > > > > When I did this, I made a shell operator for the action of A0^{-1} B0 > which has a KSPSolve() in it, so you can use your P0 preconditioning > matrix, and > > then handed that to EPS. You can see me do it here: > > > > > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > > > > I had a hard time getting the embedded solver to work the way I wanted, > but maybe that is the better way. > > > > Thanks, > > > > Matt > > > > thanks for any advice > > best wishes > > Florian > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Mon Feb 8 10:40:24 2021 From: dave.mayhem23 at gmail.com (Dave May) Date: Mon, 8 Feb 2021 17:40:24 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: On Mon 8. Feb 2021 at 15:49, Matthew Knepley wrote: > On Mon, Feb 8, 2021 at 9:37 AM Jose E. Roman wrote: > >> The problem can be written as A0*v=omega*B0*v and you want the >> eigenvalues omega closest to zero. If the matrices were explicitly >> available, you would do shift-and-invert with target=0, that is >> >> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >> >> A0^{-1}*B0*v=theta*v >> >> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >> >> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of >> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? >> EPS_SMALLEST_REAL will give slow convergence. >> > > Thanks Jose! I am not understanding some step. I want the smallest > eigenvalues. Should I use EPS_SMALLEST_MAGNITUDE? I appear to get what I > want > using SMALLEST_REAL, but as you say it might be slower than it has to be. > With shift-and-invert you want to use EPS_LARGEST_MAGNITUDE as Jose says. The largest magnitude v eigenvalues you obtain (see Jose equation above) from the transformed system correspond to the smallest magnitude omega eigenvalues of the original problem. Cheers Dave > Also, sometime I would like to talk about incorporating the multilevel > eigensolver. I am sure you could make lots of improvements to my initial > attempt. I will send > you a separate email, since I am getting serious about testing it. > > Thanks, > > Matt > > >> Florian: I would not recommend setting the KSP matrices directly, it may >> produce strange side-effects. We should have an interface function to pass >> this matrix. Currently there is STPrecondSetMatForPC() but it has two >> problems: (1) it is intended for STPRECOND, so cannot be used with >> Krylov-Schur, and (2) it is not currently available in the python interface. >> >> The approach used by Matt is a workaround that does not use ST, so you >> can handle linear solves with a KSP of your own. >> >> As an alternative, since your problem is symmetric, you could try LOBPCG, >> assuming that the leftmost eigenvalues are those that you want (e.g. if all >> eigenvalues are non-negative). In that case you could use >> STPrecondSetMatForPC(), but the remaining issue is calling it from python. >> >> If you are using the git repo, I could add the relevant code. >> >> Jose >> >> >> >> > El 8 feb 2021, a las 14:22, Matthew Knepley >> escribi?: >> > >> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner >> wrote: >> > Dear PETSc / SLEPc Users, >> > >> > my question is very similar to the one posted here: >> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >> > >> > The eigensystem I would like to solve looks like: >> > B0 v = 1/omega A0 v >> > B0 and A0 are both hermitian, A0 is positive definite, but only given >> as a linear operator (matshell). I am looking for the largest eigenvalues >> (=smallest omega). >> > >> > I also have a sparse approximation P0 of the A0 operator, which i would >> like to use as precondtioner, using something like this: >> > >> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >> > st = es.getST() >> > ksp = st.getKSP() >> > ksp.setOperators(self.A0, self.P0) >> > >> > Unfortunately PETSc still complains that it cannot create a >> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but >> A0.type == 'python'). >> > By the way, should P0 be an approximation of A0 or does it have to >> include B0? >> > >> > Right now I am using the krylov-schur method. Are there any >> alternatives if A0 is only given as an operator? >> > >> > Jose can correct me if I say something wrong. >> > >> > When I did this, I made a shell operator for the action of A0^{-1} B0 >> which has a KSPSolve() in it, so you can use your P0 preconditioning >> matrix, and >> > then handed that to EPS. You can see me do it here: >> > >> > >> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >> > >> > I had a hard time getting the embedded solver to work the way I wanted, >> but maybe that is the better way. >> > >> > Thanks, >> > >> > Matt >> > >> > thanks for any advice >> > best wishes >> > Florian >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Mon Feb 8 10:41:42 2021 From: dave.mayhem23 at gmail.com (Dave May) Date: Mon, 8 Feb 2021 17:41:42 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: On Mon 8. Feb 2021 at 17:40, Dave May wrote: > > > On Mon 8. Feb 2021 at 15:49, Matthew Knepley wrote: > >> On Mon, Feb 8, 2021 at 9:37 AM Jose E. Roman wrote: >> >>> The problem can be written as A0*v=omega*B0*v and you want the >>> eigenvalues omega closest to zero. If the matrices were explicitly >>> available, you would do shift-and-invert with target=0, that is >>> >>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >>> >>> A0^{-1}*B0*v=theta*v >>> >>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >>> >>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of >>> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? >>> EPS_SMALLEST_REAL will give slow convergence. >>> >> >> Thanks Jose! I am not understanding some step. I want the smallest >> eigenvalues. Should I use EPS_SMALLEST_MAGNITUDE? I appear to get what I >> want >> using SMALLEST_REAL, but as you say it might be slower than it has to be. >> > > > With shift-and-invert you want to use EPS_LARGEST_MAGNITUDE as Jose says. > The largest magnitude v > Sorry ?v? should be ?theta?! eigenvalues you obtain (see Jose equation above) from the transformed > system correspond to the smallest magnitude omega eigenvalues of the > original problem. > > Cheers > Dave > > >> Also, sometime I would like to talk about incorporating the multilevel >> eigensolver. I am sure you could make lots of improvements to my initial >> attempt. I will send >> you a separate email, since I am getting serious about testing it. >> >> Thanks, >> >> Matt >> >> >>> Florian: I would not recommend setting the KSP matrices directly, it may >>> produce strange side-effects. We should have an interface function to pass >>> this matrix. Currently there is STPrecondSetMatForPC() but it has two >>> problems: (1) it is intended for STPRECOND, so cannot be used with >>> Krylov-Schur, and (2) it is not currently available in the python interface. >>> >>> The approach used by Matt is a workaround that does not use ST, so you >>> can handle linear solves with a KSP of your own. >>> >>> As an alternative, since your problem is symmetric, you could try >>> LOBPCG, assuming that the leftmost eigenvalues are those that you want >>> (e.g. if all eigenvalues are non-negative). In that case you could use >>> STPrecondSetMatForPC(), but the remaining issue is calling it from python. >>> >>> If you are using the git repo, I could add the relevant code. >>> >>> Jose >>> >>> >>> >>> > El 8 feb 2021, a las 14:22, Matthew Knepley >>> escribi?: >>> > >>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner >>> wrote: >>> > Dear PETSc / SLEPc Users, >>> > >>> > my question is very similar to the one posted here: >>> > >>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >>> > >>> > The eigensystem I would like to solve looks like: >>> > B0 v = 1/omega A0 v >>> > B0 and A0 are both hermitian, A0 is positive definite, but only given >>> as a linear operator (matshell). I am looking for the largest eigenvalues >>> (=smallest omega). >>> > >>> > I also have a sparse approximation P0 of the A0 operator, which i >>> would like to use as precondtioner, using something like this: >>> > >>> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>> > st = es.getST() >>> > ksp = st.getKSP() >>> > ksp.setOperators(self.A0, self.P0) >>> > >>> > Unfortunately PETSc still complains that it cannot create a >>> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but >>> A0.type == 'python'). >>> > By the way, should P0 be an approximation of A0 or does it have to >>> include B0? >>> > >>> > Right now I am using the krylov-schur method. Are there any >>> alternatives if A0 is only given as an operator? >>> > >>> > Jose can correct me if I say something wrong. >>> > >>> > When I did this, I made a shell operator for the action of A0^{-1} B0 >>> which has a KSPSolve() in it, so you can use your P0 preconditioning >>> matrix, and >>> > then handed that to EPS. You can see me do it here: >>> > >>> > >>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >>> > >>> > I had a hard time getting the embedded solver to work the way I >>> wanted, but maybe that is the better way. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > thanks for any advice >>> > best wishes >>> > Florian >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> > https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 8 11:44:59 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 8 Feb 2021 12:44:59 -0500 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: On Mon, Feb 8, 2021 at 11:40 AM Dave May wrote: > On Mon 8. Feb 2021 at 15:49, Matthew Knepley wrote: > >> On Mon, Feb 8, 2021 at 9:37 AM Jose E. Roman wrote: >> >>> The problem can be written as A0*v=omega*B0*v and you want the >>> eigenvalues omega closest to zero. If the matrices were explicitly >>> available, you would do shift-and-invert with target=0, that is >>> >>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >>> >>> A0^{-1}*B0*v=theta*v >>> >>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >>> >>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of >>> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? >>> EPS_SMALLEST_REAL will give slow convergence. >>> >> >> Thanks Jose! I am not understanding some step. I want the smallest >> eigenvalues. Should I use EPS_SMALLEST_MAGNITUDE? I appear to get what I >> want >> using SMALLEST_REAL, but as you say it might be slower than it has to be. >> > > > With shift-and-invert you want to use EPS_LARGEST_MAGNITUDE as Jose says. > The largest magnitude v eigenvalues you obtain (see Jose equation above) > from the transformed system correspond to the smallest magnitude omega > eigenvalues of the original problem. > Okay. In my system for BAMG, however, I do not have 1/\omega, but just \lambda, so I think it should be EPS_SMALLEST_MAGNITUDE now. I can check that. Thanks, Matt > Cheers > Dave > > >> Also, sometime I would like to talk about incorporating the multilevel >> eigensolver. I am sure you could make lots of improvements to my initial >> attempt. I will send >> you a separate email, since I am getting serious about testing it. >> >> Thanks, >> >> Matt >> >> >>> Florian: I would not recommend setting the KSP matrices directly, it may >>> produce strange side-effects. We should have an interface function to pass >>> this matrix. Currently there is STPrecondSetMatForPC() but it has two >>> problems: (1) it is intended for STPRECOND, so cannot be used with >>> Krylov-Schur, and (2) it is not currently available in the python interface. >>> >>> The approach used by Matt is a workaround that does not use ST, so you >>> can handle linear solves with a KSP of your own. >>> >>> As an alternative, since your problem is symmetric, you could try >>> LOBPCG, assuming that the leftmost eigenvalues are those that you want >>> (e.g. if all eigenvalues are non-negative). In that case you could use >>> STPrecondSetMatForPC(), but the remaining issue is calling it from python. >>> >>> If you are using the git repo, I could add the relevant code. >>> >>> Jose >>> >>> >>> >>> > El 8 feb 2021, a las 14:22, Matthew Knepley >>> escribi?: >>> > >>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner >>> wrote: >>> > Dear PETSc / SLEPc Users, >>> > >>> > my question is very similar to the one posted here: >>> > >>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >>> > >>> > The eigensystem I would like to solve looks like: >>> > B0 v = 1/omega A0 v >>> > B0 and A0 are both hermitian, A0 is positive definite, but only given >>> as a linear operator (matshell). I am looking for the largest eigenvalues >>> (=smallest omega). >>> > >>> > I also have a sparse approximation P0 of the A0 operator, which i >>> would like to use as precondtioner, using something like this: >>> > >>> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>> > st = es.getST() >>> > ksp = st.getKSP() >>> > ksp.setOperators(self.A0, self.P0) >>> > >>> > Unfortunately PETSc still complains that it cannot create a >>> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but >>> A0.type == 'python'). >>> > By the way, should P0 be an approximation of A0 or does it have to >>> include B0? >>> > >>> > Right now I am using the krylov-schur method. Are there any >>> alternatives if A0 is only given as an operator? >>> > >>> > Jose can correct me if I say something wrong. >>> > >>> > When I did this, I made a shell operator for the action of A0^{-1} B0 >>> which has a KSPSolve() in it, so you can use your P0 preconditioning >>> matrix, and >>> > then handed that to EPS. You can see me do it here: >>> > >>> > >>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >>> > >>> > I had a hard time getting the embedded solver to work the way I >>> wanted, but maybe that is the better way. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > thanks for any advice >>> > best wishes >>> > Florian >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> > https://www.cse.buffalo.edu/~knepley/ >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From y.juntao at hotmail.com Mon Feb 8 22:50:02 2021 From: y.juntao at hotmail.com (Karl Yang) Date: Tue, 9 Feb 2021 12:50:02 +0800 Subject: [petsc-users] Help needed on DMDA dm_boundary_none and matsetvaluesstencil Message-ID: Hi, all I've encountered some issues with DM_BOUDNARY_NONE and MatSetValuesStencil. I had a code with DM_BOUNDARY_PERIODIC which was working fine. And I simply change the boundary, and find it not working any more. I wonder is there any difference in terms of indexing and stencil for DM_BOUNDARY_NONE. The following is a simplified code to demonstrate what I was doing. It is basically assembling for finite elements. But the MatSetValuesStencil seems not adding values into the matrix as I expected and some entries disappeared after MatSetValuesStencil for the second time. I've attached the output at the two different matview location. Some entries in the matrix disappeared after the second values add. And the order of matrix is wired to me, the matrix output is not in a ascending order. Appreciate if anyone would help. /////////////////////////demo code /////////////////////////////////// DM dm; Mat A; MatStencil s_u[4]; DMDACreate2d(PETSC_COMM_SELF, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DMDA_STENCIL_BOX, 5, 5, PETSC_DECIDE, PETSC_DECIDE, 3, 1, NULL, NULL, &dm); DMSetMatType(dm, MATAIJ); DMSetFromOptions(dm); DMSetUp(dm); DMDASetUniformCoordinates(dm, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0); DMSetMatrixPreallocateOnly(dm, PETSC_TRUE); DMCreateMatrix(dm, &A); s_u[0].i = 0; s_u[0].j = 0; s_u[0].c = 1; s_u[1].i = 0; s_u[1].j = 0+1; s_u[1].c = 1; s_u[2].i = 0+1; s_u[2].j = 0+1; s_u[2].c = 1; s_u[3].i = 0+1; s_u[3].j = 0; s_u[3].c = 1; double Ke[16]; for (int n=0;n<16;++n){Ke[n]=1;}; MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES); // MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); // MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); // MatView(A, PETSC_VIEWER_STDOUT_WORLD); //first matview s_u[0].i = 1; s_u[0].j = 0; s_u[0].c = 1; s_u[1].i = 1; s_u[1].j = 0+1; s_u[1].c = 1; s_u[2].i = 1+1; s_u[2].j = 0+1; s_u[2].c = 1; s_u[3].i = 1+1; s_u[3].j = 0; s_u[3].c = 1; MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); MatView(A, PETSC_VIEWER_STDOUT_WORLD); //second matview ////////////////first matview //////////////////////// row 0: row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.) row 2: row 3: row 4: (1, 1.) (16, 1.) (19, 1.) (4, 1.) row 5: row 6: row 7: row 8: row 9: row 10: row 11: row 12: row 13: row 14: row 15: row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.) row 17: row 18: row 19: (1, 1.) (16, 1.) (19, 1.) (4, 1.) row 20: row 21: row 22: row 23: row 24: row 25: row 26: row 27: row 28: row 29: row 30: row 31: row 32: row 33: row 34: row 35: row 36: row 37: row 38: row 39: row 40: row 41: row 42: row 43: row 44: row 45: row 46: row 47: row 48: row 49: row 50: row 51: row 52: row 53: row 54: row 55: row 56: row 57: row 58: row 59: row 60: row 61: row 62: row 63: row 64: row 65: row 66: row 67: row 68: row 69: row 70: row 71: row 72: row 73: row 74: ///////////////second matview///////////////////// row 0: row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.) row 2: row 3: row 4: (4, 1.) (19, 1.) (22, 1.) (7, 1.) row 5: row 6: row 7: (4, 1.) (19, 1.) (22, 1.) (7, 1.) row 8: row 9: row 10: row 11: row 12: row 13: row 14: row 15: row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.) row 17: row 18: row 19: (4, 1.) (19, 1.) (22, 1.) (7, 1.) row 20: row 21: row 22: (4, 1.) (19, 1.) (22, 1.) (7, 1.) row 23: row 24: row 25: row 26: row 27: row 28: row 29: row 30: row 31: row 32: row 33: row 34: row 35: row 36: row 37: row 38: row 39: row 40: row 41: row 42: row 43: row 44: row 45: row 46: row 47: row 48: row 49: row 50: row 51: row 52: row 53: row 54: row 55: row 56: row 57: row 58: row 59: row 60: row 61: row 62: row 63: row 64: row 65: row 66: row 67: row 68: row 69: row 70: row 71: row 72: row 73: row 74: -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 9 07:41:13 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 9 Feb 2021 08:41:13 -0500 Subject: [petsc-users] Help needed on DMDA dm_boundary_none and matsetvaluesstencil In-Reply-To: References: Message-ID: On Mon, Feb 8, 2021 at 11:50 PM Karl Yang wrote: > Hi, all > > I've encountered some issues with DM_BOUDNARY_NONE and > MatSetValuesStencil. I had a code with DM_BOUNDARY_PERIODIC which was > working fine. And I simply change the boundary, and find it not working any > more. I wonder is there any difference in terms of indexing and stencil for > DM_BOUNDARY_NONE. > DM_BOUNDARY_PERIODIC puts another layer of ghost points around the local boundary. Values in these are then transferred to the correct global location when DMLocalToGlobal() is run. Also, DMGlobalToLocal() inserts values form the correct global locations into the local vector. DM_BOUNDARY_NONE does not do any of that. Thanks, Matt > The following is a simplified code to demonstrate what I was doing. It is > basically assembling for finite elements. > But the MatSetValuesStencil seems not adding values into the matrix as I > expected and some entries disappeared after MatSetValuesStencil for the > second time. I've attached the output at the two different matview > location. Some entries in the matrix disappeared after the second values > add. And the order of matrix is wired to me, the matrix output is not in a > ascending order. Appreciate if anyone would help. > > /////////////////////////demo code /////////////////////////////////// > > DM dm; > Mat A; > MatStencil s_u[4]; > > DMDACreate2d(PETSC_COMM_SELF, DM_BOUNDARY_NONE, > DM_BOUNDARY_NONE, DMDA_STENCIL_BOX, 5, 5, PETSC_DECIDE, PETSC_DECIDE, 3, 1, > NULL, NULL, &dm); > DMSetMatType(dm, MATAIJ); > DMSetFromOptions(dm); > DMSetUp(dm); > DMDASetUniformCoordinates(dm, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0); > > DMSetMatrixPreallocateOnly(dm, PETSC_TRUE); > DMCreateMatrix(dm, &A); > > s_u[0].i = 0; s_u[0].j = 0; s_u[0].c = 1; > s_u[1].i = 0; s_u[1].j = 0+1; s_u[1].c = 1; > s_u[2].i = 0+1; s_u[2].j = 0+1; s_u[2].c = 1; > s_u[3].i = 0+1; s_u[3].j = 0; s_u[3].c = 1; > > double Ke[16]; > for (int n=0;n<16;++n){Ke[n]=1;}; > MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES); > > // MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > // MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > // MatView(A, PETSC_VIEWER_STDOUT_WORLD); //first matview > > s_u[0].i = 1; s_u[0].j = 0; s_u[0].c = 1; > s_u[1].i = 1; s_u[1].j = 0+1; s_u[1].c = 1; > s_u[2].i = 1+1; s_u[2].j = 0+1; s_u[2].c = 1; > s_u[3].i = 1+1; s_u[3].j = 0; s_u[3].c = 1; > > MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES); > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > MatView(A, PETSC_VIEWER_STDOUT_WORLD); //second matview > > > ////////////////first matview //////////////////////// > row 0: > row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > row 2: > row 3: > row 4: *(1, 1.) ** (16, 1.) * (19, 1.) (4, 1.) > row 5: > row 6: > row 7: > row 8: > row 9: > row 10: > row 11: > row 12: > row 13: > row 14: > row 15: > row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > row 17: > row 18: > row 19: *(1, 1.) (16, 1.)* (19, 1.) (4, 1.) > row 20: > row 21: > row 22: > row 23: > row 24: > row 25: > row 26: > row 27: > row 28: > row 29: > row 30: > row 31: > row 32: > row 33: > row 34: > row 35: > row 36: > row 37: > row 38: > row 39: > row 40: > row 41: > row 42: > row 43: > row 44: > row 45: > row 46: > row 47: > row 48: > row 49: > row 50: > row 51: > row 52: > row 53: > row 54: > row 55: > row 56: > row 57: > row 58: > row 59: > row 60: > row 61: > row 62: > row 63: > row 64: > row 65: > row 66: > row 67: > row 68: > row 69: > row 70: > row 71: > row 72: > row 73: > row 74: > > ///////////////second matview///////////////////// > row 0: > row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > row 2: > row 3: > row 4: (4, 1.) (19, 1.) (22, 1.) (7, 1.) > row 5: > row 6: > row 7: (4, 1.) (19, 1.) (22, 1.) (7, 1.) > row 8: > row 9: > row 10: > row 11: > row 12: > row 13: > row 14: > row 15: > row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > row 17: > row 18: > row 19: (4, 1.) (19, 1.) (22, 1.) (7, 1.) > row 20: > row 21: > row 22: (4, 1.) (19, 1.) (22, 1.) (7, 1.) > row 23: > row 24: > row 25: > row 26: > row 27: > row 28: > row 29: > row 30: > row 31: > row 32: > row 33: > row 34: > row 35: > row 36: > row 37: > row 38: > row 39: > row 40: > row 41: > row 42: > row 43: > row 44: > row 45: > row 46: > row 47: > row 48: > row 49: > row 50: > row 51: > row 52: > row 53: > row 54: > row 55: > row 56: > row 57: > row 58: > row 59: > row 60: > row 61: > row 62: > row 63: > row 64: > row 65: > row 66: > row 67: > row 68: > row 69: > row 70: > row 71: > row 72: > row 73: > row 74: > > [image: Sent from Mailspring] -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From y.juntao at hotmail.com Tue Feb 9 08:03:08 2021 From: y.juntao at hotmail.com (Karl Yang) Date: Tue, 9 Feb 2021 22:03:08 +0800 Subject: [petsc-users] Help needed on DMDA dm_boundary_none and matsetvaluesstencil In-Reply-To: References: Message-ID: Hi, Matt Thanks for your reply. I went around by direct creating matrix with local to global mapping. But now I think DM_BOUNDARY_GHOSTED should be just fine given the information. Regards Juntao On Feb 9 2021, at 9:41 pm, Matthew Knepley wrote: > On Mon, Feb 8, 2021 at 11:50 PM Karl Yang wrote: > > > Hi, all > > > > I've encountered some issues with DM_BOUDNARY_NONE and MatSetValuesStencil. I had a code with DM_BOUNDARY_PERIODIC which was working fine. And I simply change the boundary, and find it not working any more. I wonder is there any difference in terms of indexing and stencil for DM_BOUNDARY_NONE. > > DM_BOUNDARY_PERIODIC puts another layer of ghost points around the local boundary. Values in these are then transferred to the correct global > location when DMLocalToGlobal() is run. Also, DMGlobalToLocal() inserts values form the correct global locations into the local vector. DM_BOUNDARY_NONE > does not do any of that. > > Thanks, > > Matt > > > The following is a simplified code to demonstrate what I was doing. It is basically assembling for finite elements. > > But the MatSetValuesStencil seems not adding values into the matrix as I expected and some entries disappeared after MatSetValuesStencil for the second time. I've attached the output at the two different matview location. Some entries in the matrix disappeared after the second values add. And the order of matrix is wired to me, the matrix output is not in a ascending order. Appreciate if anyone would help. > > > > /////////////////////////demo code /////////////////////////////////// > > DM dm; > > Mat A; > > MatStencil s_u[4]; > > > > DMDACreate2d(PETSC_COMM_SELF, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DMDA_STENCIL_BOX, 5, 5, PETSC_DECIDE, PETSC_DECIDE, 3, 1, NULL, NULL, &dm); > > DMSetMatType(dm, MATAIJ); > > DMSetFromOptions(dm); > > DMSetUp(dm); > > DMDASetUniformCoordinates(dm, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0); > > > > DMSetMatrixPreallocateOnly(dm, PETSC_TRUE); > > DMCreateMatrix(dm, &A); > > > > s_u[0].i = 0; s_u[0].j = 0; s_u[0].c = 1; > > s_u[1].i = 0; s_u[1].j = 0+1; s_u[1].c = 1; > > s_u[2].i = 0+1; s_u[2].j = 0+1; s_u[2].c = 1; > > s_u[3].i = 0+1; s_u[3].j = 0; s_u[3].c = 1; > > > > double Ke[16]; > > for (int n=0;n<16;++n){Ke[n]=1;}; > > MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES); > > > > // MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > > // MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > // MatView(A, PETSC_VIEWER_STDOUT_WORLD); //first matview > > > > s_u[0].i = 1; s_u[0].j = 0; s_u[0].c = 1; > > s_u[1].i = 1; s_u[1].j = 0+1; s_u[1].c = 1; > > s_u[2].i = 1+1; s_u[2].j = 0+1; s_u[2].c = 1; > > s_u[3].i = 1+1; s_u[3].j = 0; s_u[3].c = 1; > > > > MatSetValuesStencil(A,4,s_u,4,s_u,Ke,ADD_VALUES); > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > MatView(A, PETSC_VIEWER_STDOUT_WORLD); //second matview > > > > > > ////////////////first matview //////////////////////// > > row 0: > > row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > > row 2: > > row 3: > > row 4: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > > row 5: > > row 6: > > row 7: > > row 8: > > row 9: > > row 10: > > row 11: > > row 12: > > row 13: > > row 14: > > row 15: > > row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > > row 17: > > row 18: > > row 19: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > > row 20: > > row 21: > > row 22: > > row 23: > > row 24: > > row 25: > > row 26: > > row 27: > > row 28: > > row 29: > > row 30: > > row 31: > > row 32: > > row 33: > > row 34: > > row 35: > > row 36: > > row 37: > > row 38: > > row 39: > > row 40: > > row 41: > > row 42: > > row 43: > > row 44: > > row 45: > > row 46: > > row 47: > > row 48: > > row 49: > > row 50: > > row 51: > > row 52: > > row 53: > > row 54: > > row 55: > > row 56: > > row 57: > > row 58: > > row 59: > > row 60: > > row 61: > > row 62: > > row 63: > > row 64: > > row 65: > > row 66: > > row 67: > > row 68: > > row 69: > > row 70: > > row 71: > > row 72: > > row 73: > > row 74: > > > > ///////////////second matview///////////////////// > > row 0: > > row 1: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > > row 2: > > row 3: > > row 4: (4, 1.) (19, 1.) (22, 1.) (7, 1.) > > row 5: > > row 6: > > row 7: (4, 1.) (19, 1.) (22, 1.) (7, 1.) > > row 8: > > row 9: > > row 10: > > row 11: > > row 12: > > row 13: > > row 14: > > row 15: > > row 16: (1, 1.) (16, 1.) (19, 1.) (4, 1.) > > row 17: > > row 18: > > row 19: (4, 1.) (19, 1.) (22, 1.) (7, 1.) > > row 20: > > row 21: > > row 22: (4, 1.) (19, 1.) (22, 1.) (7, 1.) > > row 23: > > row 24: > > row 25: > > row 26: > > row 27: > > row 28: > > row 29: > > row 30: > > row 31: > > row 32: > > row 33: > > row 34: > > row 35: > > row 36: > > row 37: > > row 38: > > row 39: > > row 40: > > row 41: > > row 42: > > row 43: > > row 44: > > row 45: > > row 46: > > row 47: > > row 48: > > row 49: > > row 50: > > row 51: > > row 52: > > row 53: > > row 54: > > row 55: > > row 56: > > row 57: > > row 58: > > row 59: > > row 60: > > row 61: > > row 62: > > row 63: > > row 64: > > row 65: > > row 66: > > row 67: > > row 68: > > row 69: > > row 70: > > row 71: > > row 72: > > row 73: > > row 74: > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > > https://www.cse.buffalo.edu/~knepley/ (https://link.getmailspring.com/link/E8FF16A6-B353-40C3-9F92-7DD517718AE1 at getmailspring.com/1?redirect=http%3A%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&recipient=cGV0c2MtdXNlcnNAbWNzLmFubC5nb3Y%3D) -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Wed Feb 10 03:07:03 2021 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Wed, 10 Feb 2021 10:07:03 +0100 Subject: [petsc-users] Using parmetis from petsc Message-ID: Hello PETSc friends, I am working on a code to partition a mesh in parallel and I am looking at parmetis. As far as I know petsc is interfaced with metis and parmetis and I have seen people using it within dmplex. Now, I am not using dmplex, but I have petsc compiled along with my code for the linear system part. I am wondering if there is a way to load up a mesh file in the parmetis format and use petsc to get the elements partitioning only in output. Is that possible? Thank you for the help, Edoardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 10 06:12:18 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Feb 2021 07:12:18 -0500 Subject: [petsc-users] Using parmetis from petsc In-Reply-To: References: Message-ID: On Wed, Feb 10, 2021 at 4:07 AM Edoardo alinovi wrote: > Hello PETSc friends, > > I am working on a code to partition a mesh in parallel and I am looking > at parmetis. As far as I know petsc is interfaced with metis and parmetis > and I have seen people using it within dmplex. Now, I am not using dmplex, > but I have petsc compiled along with my code for the linear system part. > I am wondering if there is a way to load up a mesh file in the parmetis > format and use petsc to get the elements partitioning only in output. Is > that possible? > ParMetis does not really have a mesh format. It partitions distributed graphs. Most people want to partition cells in their mesh, and then ParMetis would want the graph for cell connectivity. This is not usually what people store, so typically there is a conversion process here. If you want to use PETSc for partitioning, and not use DMPlex, the easiest way to do it is to put your cell connectivity in a Mat, storing the adjacency graph for cells. Then use MatPartitioning. Thanks, Matt > Thank you for the help, > > Edoardo > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Wed Feb 10 09:00:49 2021 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Wed, 10 Feb 2021 16:00:49 +0100 Subject: [petsc-users] Using parmetis from petsc In-Reply-To: References: Message-ID: Thanks Matthew, Probably in my case is a good idea to use parmetis directly as i just need the cell distribution in pre-procrssing. Thanks for the clarification. On an affine side, it's a while i am interrogating my self about a thing. Let's say i have my cell distribution accross processors from parmetis. Since petsc needs to have the unknows from 1 to N0 hosted by rank 0, N0+1:N2 hosted by rank 2 and so on, what i am doing is relabelling local cells in order to meet this requirement . Is that right? I am wondering if such a way of things is leading to a suboptimal matrix bandwith. What do you think about this? Many thanks, Edoardo On Wed, 10 Feb 2021, 13:12 Matthew Knepley, wrote: > On Wed, Feb 10, 2021 at 4:07 AM Edoardo alinovi > wrote: > >> Hello PETSc friends, >> >> I am working on a code to partition a mesh in parallel and I am looking >> at parmetis. As far as I know petsc is interfaced with metis and parmetis >> and I have seen people using it within dmplex. Now, I am not using dmplex, >> but I have petsc compiled along with my code for the linear system part. >> I am wondering if there is a way to load up a mesh file in the parmetis >> format and use petsc to get the elements partitioning only in output. Is >> that possible? >> > > ParMetis does not really have a mesh format. It partitions distributed > graphs. Most people want to partition cells in their mesh, and > then ParMetis would want the graph for cell connectivity. This is not > usually what people store, so typically there is a conversion process > here. If you want to use PETSc for partitioning, and not use DMPlex, the > easiest way to do it is to put your cell connectivity in a Mat, > storing the adjacency graph for cells. Then use MatPartitioning. > > Thanks, > > Matt > > >> Thank you for the help, >> >> Edoardo >> > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 10 09:06:02 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Feb 2021 10:06:02 -0500 Subject: [petsc-users] Using parmetis from petsc In-Reply-To: References: Message-ID: On Wed, Feb 10, 2021 at 10:00 AM Edoardo alinovi wrote: > Thanks Matthew, > > Probably in my case is a good idea to use parmetis directly as i just need > the cell distribution in pre-procrssing. Thanks for the clarification. > > On an affine side, it's a while i am interrogating my self about a thing. > Let's say i have my cell distribution accross processors from parmetis. > Since petsc needs to have the unknows from 1 to N0 hosted by rank 0, > N0+1:N2 hosted by rank 2 and so on, what i am doing is relabelling local > cells in order to meet this requirement . Is that right? I am wondering if > such a way of things is leading to a suboptimal matrix bandwith. What do > you think about this? > I do not understand. If parmetis gives you a partition, you would have to move the data to match the new partition. Thanks, Matt > Many thanks, > > Edoardo > > On Wed, 10 Feb 2021, 13:12 Matthew Knepley, wrote: > >> On Wed, Feb 10, 2021 at 4:07 AM Edoardo alinovi < >> edoardo.alinovi at gmail.com> wrote: >> >>> Hello PETSc friends, >>> >>> I am working on a code to partition a mesh in parallel and I am looking >>> at parmetis. As far as I know petsc is interfaced with metis and parmetis >>> and I have seen people using it within dmplex. Now, I am not using dmplex, >>> but I have petsc compiled along with my code for the linear system part. >>> I am wondering if there is a way to load up a mesh file in the parmetis >>> format and use petsc to get the elements partitioning only in output. Is >>> that possible? >>> >> >> ParMetis does not really have a mesh format. It partitions distributed >> graphs. Most people want to partition cells in their mesh, and >> then ParMetis would want the graph for cell connectivity. This is not >> usually what people store, so typically there is a conversion process >> here. If you want to use PETSc for partitioning, and not use DMPlex, the >> easiest way to do it is to put your cell connectivity in a Mat, >> storing the adjacency graph for cells. Then use MatPartitioning. >> >> Thanks, >> >> Matt >> >> >>> Thank you for the help, >>> >>> Edoardo >>> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Wed Feb 10 10:46:57 2021 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Wed, 10 Feb 2021 17:46:57 +0100 Subject: [petsc-users] shell preconditioner for Schur complement Message-ID: Dear PETSc users, ??? we are trying to program a preconditioner for the Schur complement of a Stokes system, but it seems that the r.h.s. for the Schur complement system differs from what we expect by a scale factor, which we don't understand. Our setup has a system matrix A divided in 2x2 blocks for velocity and pressure variables. We have programmed our preconditioner in a routine PrecondSchur and in the main program we do PC pc; KSPGetPC(kspA,&pc); PCSetFromOptions(pc); KSPSetOperators(kspA, A, A); KSPSetInitialGuessNonzero(kspA,PETSC_FALSE); KSPSetFromOptions(kspA); KSP *subksp; PetscInt nfield; PCSetUp(pc); PCFieldSplitGetSubKSP(pc, &nfield, &subksp); PC pcSchur; KSPGetPC(subksp[1],&pcSchur); PCSetType(pcSchur,PCSHELL); PCShellSetApply(pcSchur,PrecondSchur); KSPSetFromOptions(subksp[1]); and eventually KSPSolve(A,b,solution); We run the code with options ?-ksp_type fgmres \ ?-pc_type fieldsplit -pc_fieldsplit_type schur \ ?-pc_fieldsplit_schur_fact_type full \ and, from reading section 2.3.5 of the PETSc manual, we'd expect that the first r.h.s. passed to PrecondSchur be exactly ??? b_1-A_10*inv(A_00)*b_0 Instead (from a monitor function attached to the subksp[1] solver), the first r.h.s. appears to be scalar multiple of the above vector; we are guessing that we should take into account this multiplicative factor in our preconditioner routine, but we cannot understand where it comes from and how its value is determined. Could you explain us what is going on in the PC_SCHUR exactly, or point us to some working code example? Thanks in advance! ??? Matteo From knepley at gmail.com Wed Feb 10 11:05:28 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Feb 2021 12:05:28 -0500 Subject: [petsc-users] shell preconditioner for Schur complement In-Reply-To: References: Message-ID: On Wed, Feb 10, 2021 at 11:51 AM Matteo Semplice < matteo.semplice at uninsubria.it> wrote: > Dear PETSc users, > we are trying to program a preconditioner for the Schur complement > of a Stokes system, but it seems that the r.h.s. for the Schur > complement system differs from what we expect by a scale factor, which > we don't understand. > > Our setup has a system matrix A divided in 2x2 blocks for velocity and > pressure variables. We have programmed our preconditioner in a routine > PrecondSchur and in the main program we do > > PC pc; > KSPGetPC(kspA,&pc); > PCSetFromOptions(pc); > KSPSetOperators(kspA, A, A); > KSPSetInitialGuessNonzero(kspA,PETSC_FALSE); > KSPSetFromOptions(kspA); > KSP *subksp; > PetscInt nfield; > PCSetUp(pc); > PCFieldSplitGetSubKSP(pc, &nfield, &subksp); > PC pcSchur; > KSPGetPC(subksp[1],&pcSchur); > PCSetType(pcSchur,PCSHELL); > PCShellSetApply(pcSchur,PrecondSchur); > KSPSetFromOptions(subksp[1]); > > and eventually > > KSPSolve(A,b,solution); > > We run the code with options > > -ksp_type fgmres \ > -pc_type fieldsplit -pc_fieldsplit_type schur \ > -pc_fieldsplit_schur_fact_type full \ > > and, from reading section 2.3.5 of the PETSc manual, we'd expect that > the first r.h.s. passed to PrecondSchur be exactly > b_1-A_10*inv(A_00)*b_0 > > Instead (from a monitor function attached to the subksp[1] solver), the > first r.h.s. appears to be scalar multiple of the above vector; we are > guessing that we should take into account this multiplicative factor in > our preconditioner routine, but we cannot understand where it comes from > and how its value is determined. > > Could you explain us what is going on in the PC_SCHUR exactly, or point > us to some working code example? > 1) It is hard to understand solver questions without the output of -ksp_view 2) The RHS will depend on the kind of factorization you are using for the system https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurFactType.html#PCFieldSplitSetSchurFactType I can see which one in the view output Thanks, Matt > Thanks in advance! > > Matteo > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From elena.travaglia at edu.unito.it Wed Feb 10 15:05:23 2021 From: elena.travaglia at edu.unito.it (Elena Travaglia) Date: Wed, 10 Feb 2021 22:05:23 +0100 Subject: [petsc-users] shell preconditioner for Schur complement In-Reply-To: References: Message-ID: Thanks for the link. We have set a Schur factorization of type FULL, and we passed it when we run the code with -pc_fieldsplit_schur_fact_type full Here there is the output of -ksp_view KSP Object: 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, factorization FULL Preconditioner for the Schur complement formed from A11 Split info: Split number 0 Defined by IS Split number 1 Defined by IS KSP solver for A00 block KSP Object: (fieldsplit_0_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=44, cols=44 package used to perform factorization: petsc total: nonzeros=482, allocated nonzeros=482 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 13 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=44, cols=44 total: nonzeros=482, allocated nonzeros=482 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 13 nodes, limit used is 5 KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_1_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=1, initial guess is zero tolerances: relative=1e-09, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_1_) 1 MPI processes type: shell no name linear system matrix followed by preconditioner matrix: Mat Object: (fieldsplit_1_) 1 MPI processes type: schurcomplement rows=20, cols=20 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (fieldsplit_1_) 1 MPI processes type: seqaij rows=20, cols=20 total: nonzeros=112, allocated nonzeros=112 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 10 nodes, limit used is 5 A10 Mat Object: 1 MPI processes type: seqaij rows=20, cols=44 total: nonzeros=160, allocated nonzeros=160 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 10 nodes, limit used is 5 KSP of A00 KSP Object: (fieldsplit_0_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=44, cols=44 package used to perform factorization: petsc total: nonzeros=482, allocated nonzeros=482 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 13 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=44, cols=44 total: nonzeros=482, allocated nonzeros=482 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 13 nodes, limit used is 5 A01 Mat Object: 1 MPI processes type: seqaij rows=44, cols=20 total: nonzeros=156, allocated nonzeros=156 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 12 nodes, limit used is 5 Mat Object: (fieldsplit_1_) 1 MPI processes type: seqaij rows=20, cols=20 total: nonzeros=112, allocated nonzeros=112 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 10 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=64, cols=64 total: nonzeros=910, allocated nonzeros=2432 total number of mallocs used during MatSetValues calls=128 using I-node routines: found 23 nodes, limit used is 5 We would like to understand why the first r.h.s, passed to our function for the Schur preconditioner, is not b_1-A_10*inv(A_00)*b_0, even if we used the full factorization ( without dropping any terms ). Thank you, Elena Il giorno mer 10 feb 2021 alle ore 18:05 Matthew Knepley ha scritto: > On Wed, Feb 10, 2021 at 11:51 AM Matteo Semplice < > matteo.semplice at uninsubria.it> wrote: > >> Dear PETSc users, >> we are trying to program a preconditioner for the Schur complement >> of a Stokes system, but it seems that the r.h.s. for the Schur >> complement system differs from what we expect by a scale factor, which >> we don't understand. >> >> Our setup has a system matrix A divided in 2x2 blocks for velocity and >> pressure variables. We have programmed our preconditioner in a routine >> PrecondSchur and in the main program we do >> >> PC pc; >> KSPGetPC(kspA,&pc); >> PCSetFromOptions(pc); >> KSPSetOperators(kspA, A, A); >> KSPSetInitialGuessNonzero(kspA,PETSC_FALSE); >> KSPSetFromOptions(kspA); >> KSP *subksp; >> PetscInt nfield; >> PCSetUp(pc); >> PCFieldSplitGetSubKSP(pc, &nfield, &subksp); >> PC pcSchur; >> KSPGetPC(subksp[1],&pcSchur); >> PCSetType(pcSchur,PCSHELL); >> PCShellSetApply(pcSchur,PrecondSchur); >> KSPSetFromOptions(subksp[1]); >> >> and eventually >> >> KSPSolve(A,b,solution); >> >> We run the code with options >> >> -ksp_type fgmres \ >> -pc_type fieldsplit -pc_fieldsplit_type schur \ >> -pc_fieldsplit_schur_fact_type full \ >> >> and, from reading section 2.3.5 of the PETSc manual, we'd expect that >> the first r.h.s. passed to PrecondSchur be exactly >> b_1-A_10*inv(A_00)*b_0 >> >> Instead (from a monitor function attached to the subksp[1] solver), the >> first r.h.s. appears to be scalar multiple of the above vector; we are >> guessing that we should take into account this multiplicative factor in >> our preconditioner routine, but we cannot understand where it comes from >> and how its value is determined. >> >> Could you explain us what is going on in the PC_SCHUR exactly, or point >> us to some working code example? >> > > 1) It is hard to understand solver questions without the output of > -ksp_view > > 2) The RHS will depend on the kind of factorization you are using for the > system > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurFactType.html#PCFieldSplitSetSchurFactType > > I can see which one in the view output > > Thanks, > > Matt > > >> Thanks in advance! >> >> Matteo >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial? University of Turin?email address?for students and graduates? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 10 15:23:05 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Feb 2021 16:23:05 -0500 Subject: [petsc-users] shell preconditioner for Schur complement In-Reply-To: References: Message-ID: On Wed, Feb 10, 2021 at 4:05 PM Elena Travaglia < elena.travaglia at edu.unito.it> wrote: > Thanks for the link. > > We have set a Schur factorization of type FULL, and we passed it when we > run the code with > -pc_fieldsplit_schur_fact_type full > > Here there is the output of -ksp_view > > KSP Object: 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > FieldSplit with Schur preconditioner, factorization FULL > Preconditioner for the Schur complement formed from A11 > Split info: > Split number 0 Defined by IS > Split number 1 Defined by IS > KSP solver for A00 block > KSP Object: (fieldsplit_0_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=44, cols=44 > package used to perform factorization: petsc > total: nonzeros=482, allocated nonzeros=482 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 13 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=44, cols=44 > total: nonzeros=482, allocated nonzeros=482 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 13 nodes, limit used is 5 > KSP solver for S = A11 - A10 inv(A00) A01 > KSP Object: (fieldsplit_1_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-09, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_1_) 1 MPI processes > type: shell > no name > linear system matrix followed by preconditioner matrix: > Mat Object: (fieldsplit_1_) 1 MPI processes > type: schurcomplement > rows=20, cols=20 > Schur complement A11 - A10 inv(A00) A01 > A11 > Mat Object: (fieldsplit_1_) 1 MPI processes > type: seqaij > rows=20, cols=20 > total: nonzeros=112, allocated nonzeros=112 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 10 nodes, limit used is 5 > A10 > Mat Object: 1 MPI processes > type: seqaij > rows=20, cols=44 > total: nonzeros=160, allocated nonzeros=160 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 10 nodes, limit used is 5 > KSP of A00 > KSP Object: (fieldsplit_0_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ilu > out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=44, cols=44 > package used to perform factorization: petsc > total: nonzeros=482, allocated nonzeros=482 > total number of mallocs used during MatSetValues > calls=0 > using I-node routines: found 13 nodes, limit > used is 5 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=44, cols=44 > total: nonzeros=482, allocated nonzeros=482 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 13 nodes, limit used is 5 > A01 > Mat Object: 1 MPI processes > type: seqaij > rows=44, cols=20 > total: nonzeros=156, allocated nonzeros=156 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 12 nodes, limit used is 5 > Mat Object: (fieldsplit_1_) 1 MPI processes > type: seqaij > rows=20, cols=20 > total: nonzeros=112, allocated nonzeros=112 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 10 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=64, cols=64 > total: nonzeros=910, allocated nonzeros=2432 > total number of mallocs used during MatSetValues calls=128 > using I-node routines: found 23 nodes, limit used is 5 > > > We would like to understand why the first r.h.s, passed to our function > for the Schur preconditioner, is not > b_1-A_10*inv(A_00)*b_0, > even if we used the full factorization ( without dropping any terms ). > Here is the code: https://gitlab.com/petsc/petsc/-/blob/master/src/ksp/pc/impls/fieldsplit/fieldsplit.c#L1182 I think you are saying that ilinkD->x is not what you expect on line 1196. It should be easy to print out the value at any of the intermediate stages. Thanks, Matt > Thank you, > Elena > > > > > Il giorno mer 10 feb 2021 alle ore 18:05 Matthew Knepley < > knepley at gmail.com> ha scritto: > >> On Wed, Feb 10, 2021 at 11:51 AM Matteo Semplice < >> matteo.semplice at uninsubria.it> wrote: >> >>> Dear PETSc users, >>> we are trying to program a preconditioner for the Schur complement >>> of a Stokes system, but it seems that the r.h.s. for the Schur >>> complement system differs from what we expect by a scale factor, which >>> we don't understand. >>> >>> Our setup has a system matrix A divided in 2x2 blocks for velocity and >>> pressure variables. We have programmed our preconditioner in a routine >>> PrecondSchur and in the main program we do >>> >>> PC pc; >>> KSPGetPC(kspA,&pc); >>> PCSetFromOptions(pc); >>> KSPSetOperators(kspA, A, A); >>> KSPSetInitialGuessNonzero(kspA,PETSC_FALSE); >>> KSPSetFromOptions(kspA); >>> KSP *subksp; >>> PetscInt nfield; >>> PCSetUp(pc); >>> PCFieldSplitGetSubKSP(pc, &nfield, &subksp); >>> PC pcSchur; >>> KSPGetPC(subksp[1],&pcSchur); >>> PCSetType(pcSchur,PCSHELL); >>> PCShellSetApply(pcSchur,PrecondSchur); >>> KSPSetFromOptions(subksp[1]); >>> >>> and eventually >>> >>> KSPSolve(A,b,solution); >>> >>> We run the code with options >>> >>> -ksp_type fgmres \ >>> -pc_type fieldsplit -pc_fieldsplit_type schur \ >>> -pc_fieldsplit_schur_fact_type full \ >>> >>> and, from reading section 2.3.5 of the PETSc manual, we'd expect that >>> the first r.h.s. passed to PrecondSchur be exactly >>> b_1-A_10*inv(A_00)*b_0 >>> >>> Instead (from a monitor function attached to the subksp[1] solver), the >>> first r.h.s. appears to be scalar multiple of the above vector; we are >>> guessing that we should take into account this multiplicative factor in >>> our preconditioner routine, but we cannot understand where it comes from >>> and how its value is determined. >>> >>> Could you explain us what is going on in the PC_SCHUR exactly, or point >>> us to some working code example? >>> >> >> 1) It is hard to understand solver questions without the output of >> -ksp_view >> >> 2) The RHS will depend on the kind of factorization you are using for the >> system >> >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurFactType.html#PCFieldSplitSetSchurFactType >> >> I can see which one in the view output >> >> Thanks, >> >> Matt >> >> >>> Thanks in advance! >>> >>> Matteo >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > > ------------------------ > > Indirizzo istituzionale di posta elettronica degli studenti e dei laureati > dell'Universit? degli Studi di Torino > Official University of Turin email address for students and graduates > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 10 23:41:41 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 10 Feb 2021 23:41:41 -0600 Subject: [petsc-users] shell preconditioner for Schur complement In-Reply-To: References: Message-ID: <227EB1C9-E4FC-49D6-8376-52B6CC850F2B@petsc.dev> Best to just look at the code to see exactly what it is doing: src/ksp/pc/impls/fieldsplit/fieldsplit.c function PCApply_FieldSplit_Schur() There is no particular "scaling" applied to the vectors. It might be easiest to track through the computational process with the debugger (you can call VecView(v,0) in the debugger anytime to see the current vector) to see why the "scaling" seems to change. Barry > On Feb 10, 2021, at 10:46 AM, Matteo Semplice wrote: > > Dear PETSc users, > we are trying to program a preconditioner for the Schur complement of a Stokes system, but it seems that the r.h.s. for the Schur complement system differs from what we expect by a scale factor, which we don't understand. > > Our setup has a system matrix A divided in 2x2 blocks for velocity and pressure variables. We have programmed our preconditioner in a routine PrecondSchur and in the main program we do > > PC pc; > KSPGetPC(kspA,&pc); > PCSetFromOptions(pc); > KSPSetOperators(kspA, A, A); > KSPSetInitialGuessNonzero(kspA,PETSC_FALSE); > KSPSetFromOptions(kspA); > KSP *subksp; > PetscInt nfield; > PCSetUp(pc); > PCFieldSplitGetSubKSP(pc, &nfield, &subksp); > PC pcSchur; > KSPGetPC(subksp[1],&pcSchur); > PCSetType(pcSchur,PCSHELL); > PCShellSetApply(pcSchur,PrecondSchur); > KSPSetFromOptions(subksp[1]); > > and eventually > > KSPSolve(A,b,solution); > > We run the code with options > > -ksp_type fgmres \ > -pc_type fieldsplit -pc_fieldsplit_type schur \ > -pc_fieldsplit_schur_fact_type full \ > > and, from reading section 2.3.5 of the PETSc manual, we'd expect that the first r.h.s. passed to PrecondSchur be exactly > b_1-A_10*inv(A_00)*b_0 > > Instead (from a monitor function attached to the subksp[1] solver), the first r.h.s. appears to be scalar multiple of the above vector; we are guessing that we should take into account this multiplicative factor in our preconditioner routine, but we cannot understand where it comes from and how its value is determined. > > Could you explain us what is going on in the PC_SCHUR exactly, or point us to some working code example? > > Thanks in advance! > > Matteo > From matteo.semplice at uninsubria.it Thu Feb 11 09:52:31 2021 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Thu, 11 Feb 2021 16:52:31 +0100 Subject: [petsc-users] shell preconditioner for Schur complement In-Reply-To: <227EB1C9-E4FC-49D6-8376-52B6CC850F2B@petsc.dev> References: <227EB1C9-E4FC-49D6-8376-52B6CC850F2B@petsc.dev> Message-ID: <4af4cca9-e3f3-d0be-6375-2f2f0a3aef4b@uninsubria.it> Il 11/02/21 06:41, Barry Smith ha scritto: > Best to just look at the code to see exactly what it is doing: src/ksp/pc/impls/fieldsplit/fieldsplit.c function PCApply_FieldSplit_Schur() > > There is no particular "scaling" applied to the vectors. It might be easiest to track through the computational process with the debugger (you can call VecView(v,0) in the debugger anytime to see the current vector) to see why the "scaling" seems to change. > > Barry Found! It came from the initial rescaling b->b/norm(b) in the fgmres for the entire matrix. We are now set and we can concentrate on our routine. Thanks a lot Matthew and Barry! Matteo & Elena -- Prof. Matteo Semplice Universit? degli Studi dell?Insubria Dipartimento di Scienza e Alta Tecnologia ? DiSAT Professore Associato Via Valleggio, 11 ? 22100 Como (CO) ? Italia tel.: +39 031 2386316 From knepley at gmail.com Thu Feb 11 15:12:22 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 11 Feb 2021 16:12:22 -0500 Subject: [petsc-users] shell preconditioner for Schur complement In-Reply-To: <4af4cca9-e3f3-d0be-6375-2f2f0a3aef4b@uninsubria.it> References: <227EB1C9-E4FC-49D6-8376-52B6CC850F2B@petsc.dev> <4af4cca9-e3f3-d0be-6375-2f2f0a3aef4b@uninsubria.it> Message-ID: On Thu, Feb 11, 2021 at 10:53 AM Matteo Semplice < matteo.semplice at uninsubria.it> wrote: > > Il 11/02/21 06:41, Barry Smith ha scritto: > > Best to just look at the code to see exactly what it is doing: > src/ksp/pc/impls/fieldsplit/fieldsplit.c function > PCApply_FieldSplit_Schur() > > > > There is no particular "scaling" applied to the vectors. It might be > easiest to track through the computational process with the debugger (you > can call VecView(v,0) in the debugger anytime to see the current vector) to > see why the "scaling" seems to change. > > > > Barry > > Found! > > It came from the initial rescaling b->b/norm(b) in the fgmres for the > entire matrix. > I see. I did not document it because it is internal to the solver, but the subsolvers can see it :) Glad you got it worked out. Thanks, Matt > We are now set and we can concentrate on our routine. > > Thanks a lot Matthew and Barry! > > Matteo & Elena > > -- > Prof. Matteo Semplice > Universit? degli Studi dell?Insubria > Dipartimento di Scienza e Alta Tecnologia ? DiSAT > Professore Associato > Via Valleggio, 11 ? 22100 Como (CO) ? Italia > tel.: +39 031 2386316 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0425375 at gmail.com Fri Feb 12 02:32:26 2021 From: e0425375 at gmail.com (Florian Bruckner) Date: Fri, 12 Feb 2021 09:32:26 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: Dear Jose, Dear Matt, I needed some time to think about your answers. If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). Unfortunately this interface is not available, right? Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? thanks again, and best wishes Florian On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman wrote: > The problem can be written as A0*v=omega*B0*v and you want the eigenvalues > omega closest to zero. If the matrices were explicitly available, you would > do shift-and-invert with target=0, that is > > (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is > > A0^{-1}*B0*v=theta*v > > and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. > > Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of > EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? > EPS_SMALLEST_REAL will give slow convergence. > > Florian: I would not recommend setting the KSP matrices directly, it may > produce strange side-effects. We should have an interface function to pass > this matrix. Currently there is STPrecondSetMatForPC() but it has two > problems: (1) it is intended for STPRECOND, so cannot be used with > Krylov-Schur, and (2) it is not currently available in the python interface. > > The approach used by Matt is a workaround that does not use ST, so you can > handle linear solves with a KSP of your own. > > As an alternative, since your problem is symmetric, you could try LOBPCG, > assuming that the leftmost eigenvalues are those that you want (e.g. if all > eigenvalues are non-negative). In that case you could use > STPrecondSetMatForPC(), but the remaining issue is calling it from python. > > If you are using the git repo, I could add the relevant code. > > Jose > > > > > El 8 feb 2021, a las 14:22, Matthew Knepley > escribi?: > > > > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner > wrote: > > Dear PETSc / SLEPc Users, > > > > my question is very similar to the one posted here: > > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > > > > The eigensystem I would like to solve looks like: > > B0 v = 1/omega A0 v > > B0 and A0 are both hermitian, A0 is positive definite, but only given as > a linear operator (matshell). I am looking for the largest eigenvalues > (=smallest omega). > > > > I also have a sparse approximation P0 of the A0 operator, which i would > like to use as precondtioner, using something like this: > > > > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > > st = es.getST() > > ksp = st.getKSP() > > ksp.setOperators(self.A0, self.P0) > > > > Unfortunately PETSc still complains that it cannot create a > preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but > A0.type == 'python'). > > By the way, should P0 be an approximation of A0 or does it have to > include B0? > > > > Right now I am using the krylov-schur method. Are there any alternatives > if A0 is only given as an operator? > > > > Jose can correct me if I say something wrong. > > > > When I did this, I made a shell operator for the action of A0^{-1} B0 > which has a KSPSolve() in it, so you can use your P0 preconditioning > matrix, and > > then handed that to EPS. You can see me do it here: > > > > > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > > > > I had a hard time getting the embedded solver to work the way I wanted, > but maybe that is the better way. > > > > Thanks, > > > > Matt > > > > thanks for any advice > > best wishes > > Florian > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Feb 12 03:12:18 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 12 Feb 2021 10:12:18 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: <848D4FF0-7ABD-4D0A-93FF-71318A3C966D@dsic.upv.es> > El 12 feb 2021, a las 9:32, Florian Bruckner escribi?: > > Dear Jose, Dear Matt, > > I needed some time to think about your answers. > If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. > What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). > Unfortunately this interface is not available, right? Yes, when using shift-and-invert with target=0 the solver internally uses A0^{-1}*B0. It also uses the B0-inner product to preserve symmetry. > > Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? Yes, the difference here is that you have to solve the problem as non-symmetric. This is not going to be slower, maybe just a bit less accurate. And the computed eigenvectors will not be B0-orthogonal. > > I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. > > Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. > If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? > Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? LOBPCG can be used only in Hermitian problems, see Table 2.4 in the users manual. The problem A0*v=omega*(i*B0)*v can be solved as A0*v=(i*omega)*B0*v, no problem with that. Just solve it as non-Hermitian. As I said above, the computational cost should not be too different. But solving as non-Hermitian restricts the available solvers. Jose > > thanks again, > and best wishes > Florian > > On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman wrote: > The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is > > (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is > > A0^{-1}*B0*v=theta*v > > and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. > > Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. > > Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. > > The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. > > As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. > > If you are using the git repo, I could add the relevant code. > > Jose > > > > > El 8 feb 2021, a las 14:22, Matthew Knepley escribi?: > > > > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner wrote: > > Dear PETSc / SLEPc Users, > > > > my question is very similar to the one posted here: > > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > > > > The eigensystem I would like to solve looks like: > > B0 v = 1/omega A0 v > > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). > > > > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: > > > > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > > st = es.getST() > > ksp = st.getKSP() > > ksp.setOperators(self.A0, self.P0) > > > > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). > > By the way, should P0 be an approximation of A0 or does it have to include B0? > > > > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? > > > > Jose can correct me if I say something wrong. > > > > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and > > then handed that to EPS. You can see me do it here: > > > > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > > > > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. > > > > Thanks, > > > > Matt > > > > thanks for any advice > > best wishes > > Florian > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > From bsmith at petsc.dev Fri Feb 12 21:19:50 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 12 Feb 2021 21:19:50 -0600 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: Message-ID: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> > On Feb 12, 2021, at 2:32 AM, Florian Bruckner wrote: > > Dear Jose, Dear Matt, > > I needed some time to think about your answers. > If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. > What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). > Unfortunately this interface is not available, right? If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . Barry > > Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? > > I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. > > Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. > If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? > Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? > > thanks again, > and best wishes > Florian > > On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman > wrote: > The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is > > (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is > > A0^{-1}*B0*v=theta*v > > and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. > > Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. > > Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. > > The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. > > As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. > > If you are using the git repo, I could add the relevant code. > > Jose > > > > > El 8 feb 2021, a las 14:22, Matthew Knepley > escribi?: > > > > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner > wrote: > > Dear PETSc / SLEPc Users, > > > > my question is very similar to the one posted here: > > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > > > > The eigensystem I would like to solve looks like: > > B0 v = 1/omega A0 v > > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). > > > > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: > > > > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > > st = es.getST() > > ksp = st.getKSP() > > ksp.setOperators(self.A0, self.P0) > > > > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). > > By the way, should P0 be an approximation of A0 or does it have to include B0? > > > > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? > > > > Jose can correct me if I say something wrong. > > > > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and > > then handed that to EPS. You can see me do it here: > > > > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > > > > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. > > > > Thanks, > > > > Matt > > > > thanks for any advice > > best wishes > > Florian > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0425375 at gmail.com Sat Feb 13 00:25:58 2021 From: e0425375 at gmail.com (Florian Bruckner) Date: Sat, 13 Feb 2021 07:25:58 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> Message-ID: Dear Jose, Dear Barry, thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach? For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right? Regarding Barry's advice this is what I first tried: es = SLEPc.EPS().create(comm=fd.COMM_WORLD) st = es.getST() ksp = st.getKSP() ksp.setOperators(self.A0, self.P0) But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right? Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement? best wishes Florian On Sat, Feb 13, 2021 at 4:19 AM Barry Smith wrote: > > > On Feb 12, 2021, at 2:32 AM, Florian Bruckner wrote: > > Dear Jose, Dear Matt, > > I needed some time to think about your answers. > If I understand correctly, the eigenmode solver internally uses > A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP > solver and a corresponding preconditioner. > What I would need is an interface to provide not only the system Matrix A0 > (which is an operator), but also a preconditioning matrix (sparse > approximation of the operator). > Unfortunately this interface is not available, right? > > > If SLEPc does not provide this directly it is still intended to be > trivial to provide the "preconditioner matrix" (that is matrix from which > the preconditioner is built). Just get the KSP from the ST object and use > KSPSetOperators() to provide the "preconditioner matrix" . > > Barry > > > Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses > a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and > solve a standard eigenvalue problem with this modified operator. Did I > understand this correctly? > > I have two further points, which I did not mention yet: the matrix B0 is > Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using > Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). > Then I convert them into ScipyLinearOperators and use > scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. > Minv=A0^-1 is also solving within scipy using a preconditioned gmres. > Advantage of this setup is that the imaginary B0 can be handled efficiently > and also the post-processing of the eigenvectors (which requires complex > arithmetics) is simplified. > > Nevertheless I think that the mixing of PETSc and Scipy looks too > complicated and is not very flexible. > If I would use Matt's approach, could I then simply switch between > multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to > the use of matshell? > Is there a solution for the imaginary B0, or do I have to use the > non-hermitian methods? Is this a large performance drawback? > > thanks again, > and best wishes > Florian > > On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman wrote: > >> The problem can be written as A0*v=omega*B0*v and you want the >> eigenvalues omega closest to zero. If the matrices were explicitly >> available, you would do shift-and-invert with target=0, that is >> >> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >> >> A0^{-1}*B0*v=theta*v >> >> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >> >> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of >> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? >> EPS_SMALLEST_REAL will give slow convergence. >> >> Florian: I would not recommend setting the KSP matrices directly, it may >> produce strange side-effects. We should have an interface function to pass >> this matrix. Currently there is STPrecondSetMatForPC() but it has two >> problems: (1) it is intended for STPRECOND, so cannot be used with >> Krylov-Schur, and (2) it is not currently available in the python interface. >> >> The approach used by Matt is a workaround that does not use ST, so you >> can handle linear solves with a KSP of your own. >> >> As an alternative, since your problem is symmetric, you could try LOBPCG, >> assuming that the leftmost eigenvalues are those that you want (e.g. if all >> eigenvalues are non-negative). In that case you could use >> STPrecondSetMatForPC(), but the remaining issue is calling it from python. >> >> If you are using the git repo, I could add the relevant code. >> >> Jose >> >> >> >> > El 8 feb 2021, a las 14:22, Matthew Knepley >> escribi?: >> > >> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner >> wrote: >> > Dear PETSc / SLEPc Users, >> > >> > my question is very similar to the one posted here: >> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >> > >> > The eigensystem I would like to solve looks like: >> > B0 v = 1/omega A0 v >> > B0 and A0 are both hermitian, A0 is positive definite, but only given >> as a linear operator (matshell). I am looking for the largest eigenvalues >> (=smallest omega). >> > >> > I also have a sparse approximation P0 of the A0 operator, which i would >> like to use as precondtioner, using something like this: >> > >> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >> > st = es.getST() >> > ksp = st.getKSP() >> > ksp.setOperators(self.A0, self.P0) >> > >> > Unfortunately PETSc still complains that it cannot create a >> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but >> A0.type == 'python'). >> > By the way, should P0 be an approximation of A0 or does it have to >> include B0? >> > >> > Right now I am using the krylov-schur method. Are there any >> alternatives if A0 is only given as an operator? >> > >> > Jose can correct me if I say something wrong. >> > >> > When I did this, I made a shell operator for the action of A0^{-1} B0 >> which has a KSPSolve() in it, so you can use your P0 preconditioning >> matrix, and >> > then handed that to EPS. You can see me do it here: >> > >> > >> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >> > >> > I had a hard time getting the embedded solver to work the way I wanted, >> but maybe that is the better way. >> > >> > Thanks, >> > >> > Matt >> > >> > thanks for any advice >> > best wishes >> > Florian >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Sat Feb 13 02:47:40 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Sat, 13 Feb 2021 09:47:40 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> Message-ID: > On 13 Feb 2021, at 7:25 AM, Florian Bruckner wrote: > > Dear Jose, Dear Barry, > thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach? > For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right? > > Regarding Barry's advice this is what I first tried: > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > st = es.getST() > ksp = st.getKSP() > ksp.setOperators(self.A0, self.P0) > > But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right? No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1. > Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement? This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though. Thanks, Pierre > best wishes > Florian > > > On Sat, Feb 13, 2021 at 4:19 AM Barry Smith > wrote: > > >> On Feb 12, 2021, at 2:32 AM, Florian Bruckner > wrote: >> >> Dear Jose, Dear Matt, >> >> I needed some time to think about your answers. >> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. >> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). >> Unfortunately this interface is not available, right? > > If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . > > Barry > >> >> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? >> >> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. >> >> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. >> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? >> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? >> >> thanks again, >> and best wishes >> Florian >> >> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman > wrote: >> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is >> >> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >> >> A0^{-1}*B0*v=theta*v >> >> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >> >> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. >> >> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. >> >> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. >> >> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. >> >> If you are using the git repo, I could add the relevant code. >> >> Jose >> >> >> >> > El 8 feb 2021, a las 14:22, Matthew Knepley > escribi?: >> > >> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner > wrote: >> > Dear PETSc / SLEPc Users, >> > >> > my question is very similar to the one posted here: >> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >> > >> > The eigensystem I would like to solve looks like: >> > B0 v = 1/omega A0 v >> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). >> > >> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: >> > >> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >> > st = es.getST() >> > ksp = st.getKSP() >> > ksp.setOperators(self.A0, self.P0) >> > >> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). >> > By the way, should P0 be an approximation of A0 or does it have to include B0? >> > >> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? >> > >> > Jose can correct me if I say something wrong. >> > >> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and >> > then handed that to EPS. You can see me do it here: >> > >> > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >> > >> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. >> > >> > Thanks, >> > >> > Matt >> > >> > thanks for any advice >> > best wishes >> > Florian >> > >> > >> > -- >> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Feb 13 12:15:14 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 13 Feb 2021 12:15:14 -0600 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> Message-ID: <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> > On Feb 13, 2021, at 2:47 AM, Pierre Jolivet wrote: > > > >> On 13 Feb 2021, at 7:25 AM, Florian Bruckner > wrote: >> >> Dear Jose, Dear Barry, >> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach? >> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right? >> >> Regarding Barry's advice this is what I first tried: >> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >> st = es.getST() >> ksp = st.getKSP() >> ksp.setOperators(self.A0, self.P0) >> >> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right? > > No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1. Just trying to provide more clarity with the terms. If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1 Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations". Barry > >> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement? > > This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though. > > Thanks, > Pierre > >> best wishes >> Florian >> >> >> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith > wrote: >> >> >>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner > wrote: >>> >>> Dear Jose, Dear Matt, >>> >>> I needed some time to think about your answers. >>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. >>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). >>> Unfortunately this interface is not available, right? >> >> If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . >> >> Barry >> >>> >>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? >>> >>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. >>> >>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. >>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? >>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? >>> >>> thanks again, >>> and best wishes >>> Florian >>> >>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman > wrote: >>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is >>> >>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >>> >>> A0^{-1}*B0*v=theta*v >>> >>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >>> >>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. >>> >>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. >>> >>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. >>> >>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. >>> >>> If you are using the git repo, I could add the relevant code. >>> >>> Jose >>> >>> >>> >>> > El 8 feb 2021, a las 14:22, Matthew Knepley > escribi?: >>> > >>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner > wrote: >>> > Dear PETSc / SLEPc Users, >>> > >>> > my question is very similar to the one posted here: >>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >>> > >>> > The eigensystem I would like to solve looks like: >>> > B0 v = 1/omega A0 v >>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). >>> > >>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: >>> > >>> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>> > st = es.getST() >>> > ksp = st.getKSP() >>> > ksp.setOperators(self.A0, self.P0) >>> > >>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). >>> > By the way, should P0 be an approximation of A0 or does it have to include B0? >>> > >>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? >>> > >>> > Jose can correct me if I say something wrong. >>> > >>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and >>> > then handed that to EPS. You can see me do it here: >>> > >>> > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >>> > >>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > thanks for any advice >>> > best wishes >>> > Florian >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> > -- Norbert Wiener >>> > >>> > https://www.cse.buffalo.edu/~knepley/ >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0425375 at gmail.com Sat Feb 13 18:43:47 2021 From: e0425375 at gmail.com (Florian Bruckner) Date: Sun, 14 Feb 2021 01:43:47 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> Message-ID: Dear Barry, thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator. The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator. If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly? I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator. best wishes Florian On Sat, Feb 13, 2021 at 7:15 PM Barry Smith wrote: > > > On Feb 13, 2021, at 2:47 AM, Pierre Jolivet wrote: > > > > On 13 Feb 2021, at 7:25 AM, Florian Bruckner wrote: > > Dear Jose, Dear Barry, > thanks again for your reply. One final question about the B0 > orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but > they are i*B0 orthogonal? or is there an issue with Matt's approach? > For my problem I can show that eigenvalues fulfill an orthogonality > relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This > should be independent of the solving method, right? > > Regarding Barry's advice this is what I first tried: > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > st = es.getST() > ksp = st.getKSP() > ksp.setOperators(self.A0, self.P0) > > But it seems that the provided P0 is not used. Furthermore the interface > is maybe a bit confusing if ST performs some transformation. In this case > P0 needs to approximate A0^{-1}*B0 and not A0, right? > > > No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, > which looks like it is the case, you end up with A0^-1. > > > Just trying to provide more clarity with the terms. > > If ST transforms the operator in the KSP to (A0-sigma B0) and you are > providing the "sparse matrix from which the preconditioner is to be built" > then you need to provide something that approximates (A0-sigma B0). Since > the PC will use your matrix to construct a preconditioner that approximates > the inverse of (A0-sigma B0), you don't need to directly provide something > that approximates (A0-sigma B0)^-1 > > Yes, I would think SLEPc could provide an interface where it manages "the > matrix from which to construct the preconditioner" and transforms that > matrix just like the true matrix. To do it by hand you simply need to know > what A0 and B0 are and which sigma ST has selected and then you can > construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and > modB0 are your "sparser approximations". > > Barry > > > > Nevertheless I think it would be the best solution if one could provide P0 > (approx A0) and SLEPc derives the preconditioner from this. Would this be > hard to implement? > > > This is what Barry?s suggestion is implementing. Don?t know why it doesn?t > work with your Python operator though. > > Thanks, > Pierre > > best wishes > Florian > > > On Sat, Feb 13, 2021 at 4:19 AM Barry Smith wrote: > >> >> >> On Feb 12, 2021, at 2:32 AM, Florian Bruckner wrote: >> >> Dear Jose, Dear Matt, >> >> I needed some time to think about your answers. >> If I understand correctly, the eigenmode solver internally uses >> A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP >> solver and a corresponding preconditioner. >> What I would need is an interface to provide not only the system Matrix >> A0 (which is an operator), but also a preconditioning matrix (sparse >> approximation of the operator). >> Unfortunately this interface is not available, right? >> >> >> If SLEPc does not provide this directly it is still intended to be >> trivial to provide the "preconditioner matrix" (that is matrix from which >> the preconditioner is built). Just get the KSP from the ST object and use >> KSPSetOperators() to provide the "preconditioner matrix" . >> >> Barry >> >> >> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator >> uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 >> and solve a standard eigenvalue problem with this modified operator. Did I >> understand this correctly? >> >> I have two further points, which I did not mention yet: the matrix B0 is >> Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using >> Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). >> Then I convert them into ScipyLinearOperators and use >> scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. >> Minv=A0^-1 is also solving within scipy using a preconditioned gmres. >> Advantage of this setup is that the imaginary B0 can be handled efficiently >> and also the post-processing of the eigenvectors (which requires complex >> arithmetics) is simplified. >> >> Nevertheless I think that the mixing of PETSc and Scipy looks too >> complicated and is not very flexible. >> If I would use Matt's approach, could I then simply switch between >> multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to >> the use of matshell? >> Is there a solution for the imaginary B0, or do I have to use the >> non-hermitian methods? Is this a large performance drawback? >> >> thanks again, >> and best wishes >> Florian >> >> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman wrote: >> >>> The problem can be written as A0*v=omega*B0*v and you want the >>> eigenvalues omega closest to zero. If the matrices were explicitly >>> available, you would do shift-and-invert with target=0, that is >>> >>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >>> >>> A0^{-1}*B0*v=theta*v >>> >>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >>> >>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of >>> EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? >>> EPS_SMALLEST_REAL will give slow convergence. >>> >>> Florian: I would not recommend setting the KSP matrices directly, it may >>> produce strange side-effects. We should have an interface function to pass >>> this matrix. Currently there is STPrecondSetMatForPC() but it has two >>> problems: (1) it is intended for STPRECOND, so cannot be used with >>> Krylov-Schur, and (2) it is not currently available in the python interface. >>> >>> The approach used by Matt is a workaround that does not use ST, so you >>> can handle linear solves with a KSP of your own. >>> >>> As an alternative, since your problem is symmetric, you could try >>> LOBPCG, assuming that the leftmost eigenvalues are those that you want >>> (e.g. if all eigenvalues are non-negative). In that case you could use >>> STPrecondSetMatForPC(), but the remaining issue is calling it from python. >>> >>> If you are using the git repo, I could add the relevant code. >>> >>> Jose >>> >>> >>> >>> > El 8 feb 2021, a las 14:22, Matthew Knepley >>> escribi?: >>> > >>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner >>> wrote: >>> > Dear PETSc / SLEPc Users, >>> > >>> > my question is very similar to the one posted here: >>> > >>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >>> > >>> > The eigensystem I would like to solve looks like: >>> > B0 v = 1/omega A0 v >>> > B0 and A0 are both hermitian, A0 is positive definite, but only given >>> as a linear operator (matshell). I am looking for the largest eigenvalues >>> (=smallest omega). >>> > >>> > I also have a sparse approximation P0 of the A0 operator, which i >>> would like to use as precondtioner, using something like this: >>> > >>> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>> > st = es.getST() >>> > ksp = st.getKSP() >>> > ksp.setOperators(self.A0, self.P0) >>> > >>> > Unfortunately PETSc still complains that it cannot create a >>> preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but >>> A0.type == 'python'). >>> > By the way, should P0 be an approximation of A0 or does it have to >>> include B0? >>> > >>> > Right now I am using the krylov-schur method. Are there any >>> alternatives if A0 is only given as an operator? >>> > >>> > Jose can correct me if I say something wrong. >>> > >>> > When I did this, I made a shell operator for the action of A0^{-1} B0 >>> which has a KSPSolve() in it, so you can use your P0 preconditioning >>> matrix, and >>> > then handed that to EPS. You can see me do it here: >>> > >>> > >>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >>> > >>> > I had a hard time getting the embedded solver to work the way I >>> wanted, but maybe that is the better way. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > thanks for any advice >>> > best wishes >>> > Florian >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> > https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.py Type: text/x-python Size: 2509 bytes Desc: not available URL: From bsmith at petsc.dev Sun Feb 14 14:41:39 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 14 Feb 2021 14:41:39 -0600 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> Message-ID: <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev> Florian, I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). All I was saying is theoretically there could/should be such support in SLEPc. Barry > On Feb 13, 2021, at 6:43 PM, Florian Bruckner wrote: > > Dear Barry, > thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator. > The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator. > > If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly? > I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator. > > best wishes > Florian > > On Sat, Feb 13, 2021 at 7:15 PM Barry Smith > wrote: > > >> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet > wrote: >> >> >> >>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner > wrote: >>> >>> Dear Jose, Dear Barry, >>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach? >>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right? >>> >>> Regarding Barry's advice this is what I first tried: >>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>> st = es.getST() >>> ksp = st.getKSP() >>> ksp.setOperators(self.A0, self.P0) >>> >>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right? >> >> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1. > > Just trying to provide more clarity with the terms. > > If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1 > > Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations". > > Barry > > >> >>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement? >> >> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though. >> >> Thanks, >> Pierre >> >>> best wishes >>> Florian >>> >>> >>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith > wrote: >>> >>> >>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner > wrote: >>>> >>>> Dear Jose, Dear Matt, >>>> >>>> I needed some time to think about your answers. >>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. >>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). >>>> Unfortunately this interface is not available, right? >>> >>> If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . >>> >>> Barry >>> >>>> >>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? >>>> >>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. >>>> >>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. >>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? >>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? >>>> >>>> thanks again, >>>> and best wishes >>>> Florian >>>> >>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman > wrote: >>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is >>>> >>>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >>>> >>>> A0^{-1}*B0*v=theta*v >>>> >>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >>>> >>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. >>>> >>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. >>>> >>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. >>>> >>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. >>>> >>>> If you are using the git repo, I could add the relevant code. >>>> >>>> Jose >>>> >>>> >>>> >>>> > El 8 feb 2021, a las 14:22, Matthew Knepley > escribi?: >>>> > >>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner > wrote: >>>> > Dear PETSc / SLEPc Users, >>>> > >>>> > my question is very similar to the one posted here: >>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >>>> > >>>> > The eigensystem I would like to solve looks like: >>>> > B0 v = 1/omega A0 v >>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). >>>> > >>>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: >>>> > >>>> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>>> > st = es.getST() >>>> > ksp = st.getKSP() >>>> > ksp.setOperators(self.A0, self.P0) >>>> > >>>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). >>>> > By the way, should P0 be an approximation of A0 or does it have to include B0? >>>> > >>>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? >>>> > >>>> > Jose can correct me if I say something wrong. >>>> > >>>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and >>>> > then handed that to EPS. You can see me do it here: >>>> > >>>> > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >>>> > >>>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. >>>> > >>>> > Thanks, >>>> > >>>> > Matt >>>> > >>>> > thanks for any advice >>>> > best wishes >>>> > Florian >>>> > >>>> > >>>> > -- >>>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> > -- Norbert Wiener >>>> > >>>> > https://www.cse.buffalo.edu/~knepley/ >>>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Feb 15 06:27:00 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 15 Feb 2021 13:27:00 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev> References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev> Message-ID: <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es> I will think about the viability of adding an interface function to pass the preconditioner matrix. Regarding the question about the B-orthogonality of computed vectors, in the symmetric solver the B-orthogonality is enforced during the computation, so you have guarantee that the computed vectors satisfy it. But if solved as non-symetric, the computed vectors may depart from B-orthogonality, unless the tolerance is very small. Jose > El 14 feb 2021, a las 21:41, Barry Smith escribi?: > > > Florian, > > I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). > > All I was saying is theoretically there could/should be such support in SLEPc. > > Barry > > >> On Feb 13, 2021, at 6:43 PM, Florian Bruckner wrote: >> >> Dear Barry, >> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator. >> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator. >> >> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly? >> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator. >> >> best wishes >> Florian >> >> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith wrote: >> >> >>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet wrote: >>> >>> >>> >>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner wrote: >>>> >>>> Dear Jose, Dear Barry, >>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach? >>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right? >>>> >>>> Regarding Barry's advice this is what I first tried: >>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>>> st = es.getST() >>>> ksp = st.getKSP() >>>> ksp.setOperators(self.A0, self.P0) >>>> >>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right? >>> >>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1. >> >> Just trying to provide more clarity with the terms. >> >> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1 >> >> Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations". >> >> Barry >> >> >>> >>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement? >>> >>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though. >>> >>> Thanks, >>> Pierre >>> >>>> best wishes >>>> Florian >>>> >>>> >>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith wrote: >>>> >>>> >>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner wrote: >>>>> >>>>> Dear Jose, Dear Matt, >>>>> >>>>> I needed some time to think about your answers. >>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. >>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). >>>>> Unfortunately this interface is not available, right? >>>> >>>> If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . >>>> >>>> Barry >>>> >>>>> >>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? >>>>> >>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. >>>>> >>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. >>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? >>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? >>>>> >>>>> thanks again, >>>>> and best wishes >>>>> Florian >>>>> >>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman wrote: >>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is >>>>> >>>>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >>>>> >>>>> A0^{-1}*B0*v=theta*v >>>>> >>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >>>>> >>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. >>>>> >>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. >>>>> >>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. >>>>> >>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. >>>>> >>>>> If you are using the git repo, I could add the relevant code. >>>>> >>>>> Jose >>>>> >>>>> >>>>> >>>>> > El 8 feb 2021, a las 14:22, Matthew Knepley escribi?: >>>>> > >>>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner wrote: >>>>> > Dear PETSc / SLEPc Users, >>>>> > >>>>> > my question is very similar to the one posted here: >>>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >>>>> > >>>>> > The eigensystem I would like to solve looks like: >>>>> > B0 v = 1/omega A0 v >>>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). >>>>> > >>>>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: >>>>> > >>>>> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>>>> > st = es.getST() >>>>> > ksp = st.getKSP() >>>>> > ksp.setOperators(self.A0, self.P0) >>>>> > >>>>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). >>>>> > By the way, should P0 be an approximation of A0 or does it have to include B0? >>>>> > >>>>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? >>>>> > >>>>> > Jose can correct me if I say something wrong. >>>>> > >>>>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and >>>>> > then handed that to EPS. You can see me do it here: >>>>> > >>>>> > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >>>>> > >>>>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. >>>>> > >>>>> > Thanks, >>>>> > >>>>> > Matt >>>>> > >>>>> > thanks for any advice >>>>> > best wishes >>>>> > Florian >>>>> > >>>>> > >>>>> > -- >>>>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> > -- Norbert Wiener >>>>> > >>>>> > https://www.cse.buffalo.edu/~knepley/ >>>>> >>>> >>> >> >> > From knepley at gmail.com Mon Feb 15 07:53:49 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 15 Feb 2021 08:53:49 -0500 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es> References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev> <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es> Message-ID: On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman wrote: > I will think about the viability of adding an interface function to pass > the preconditioner matrix. > > Regarding the question about the B-orthogonality of computed vectors, in > the symmetric solver the B-orthogonality is enforced during the > computation, so you have guarantee that the computed vectors satisfy it. > But if solved as non-symetric, the computed vectors may depart from > B-orthogonality, unless the tolerance is very small. > Yes, the vectors I generate are not B-orthogonal. Jose, do you think there is a way to reformulate what I am doing to use the symmetric solver, even if we only have the action of B? Thanks, Matt > Jose > > > > El 14 feb 2021, a las 21:41, Barry Smith escribi?: > > > > > > Florian, > > > > I'm sorry I don't know the answers; I can only speculate. There is a > STGetShift(). > > > > All I was saying is theoretically there could/should be such support > in SLEPc. > > > > Barry > > > > > >> On Feb 13, 2021, at 6:43 PM, Florian Bruckner > wrote: > >> > >> Dear Barry, > >> thank you for your clarification. What I wanted to say is that even if > I could reset the KSP operators directly I would require to know which > transformation ST applies in order to provide the preconditioning matrix > for the correct operator. > >> The more general solution would be that SLEPc provides the interface to > pass the preconditioning matrix for A0 and ST applies the same > transformations as for the operator. > >> > >> If you write "SLEPc could provide an interface", do you mean someone > should implement it, or should it already be possible and I am not using it > correctly? > >> I wrote a small standalone example based on ex9.py from slepc4py, where > i tried to use an operator. > >> > >> best wishes > >> Florian > >> > >> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith wrote: > >> > >> > >>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet wrote: > >>> > >>> > >>> > >>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner > wrote: > >>>> > >>>> Dear Jose, Dear Barry, > >>>> thanks again for your reply. One final question about the B0 > orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but > they are i*B0 orthogonal? or is there an issue with Matt's approach? > >>>> For my problem I can show that eigenvalues fulfill an orthogonality > relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This > should be independent of the solving method, right? > >>>> > >>>> Regarding Barry's advice this is what I first tried: > >>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > >>>> st = es.getST() > >>>> ksp = st.getKSP() > >>>> ksp.setOperators(self.A0, self.P0) > >>>> > >>>> But it seems that the provided P0 is not used. Furthermore the > interface is maybe a bit confusing if ST performs some transformation. In > this case P0 needs to approximate A0^{-1}*B0 and not A0, right? > >>> > >>> No, you need to approximate (A0-sigma B0)^-1. If you have a null > shift, which looks like it is the case, you end up with A0^-1. > >> > >> Just trying to provide more clarity with the terms. > >> > >> If ST transforms the operator in the KSP to (A0-sigma B0) and you are > providing the "sparse matrix from which the preconditioner is to be built" > then you need to provide something that approximates (A0-sigma B0). Since > the PC will use your matrix to construct a preconditioner that approximates > the inverse of (A0-sigma B0), you don't need to directly provide something > that approximates (A0-sigma B0)^-1 > >> > >> Yes, I would think SLEPc could provide an interface where it manages > "the matrix from which to construct the preconditioner" and transforms that > matrix just like the true matrix. To do it by hand you simply need to know > what A0 and B0 are and which sigma ST has selected and then you can > construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and > modB0 are your "sparser approximations". > >> > >> Barry > >> > >> > >>> > >>>> Nevertheless I think it would be the best solution if one could > provide P0 (approx A0) and SLEPc derives the preconditioner from this. > Would this be hard to implement? > >>> > >>> This is what Barry?s suggestion is implementing. Don?t know why it > doesn?t work with your Python operator though. > >>> > >>> Thanks, > >>> Pierre > >>> > >>>> best wishes > >>>> Florian > >>>> > >>>> > >>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith wrote: > >>>> > >>>> > >>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner > wrote: > >>>>> > >>>>> Dear Jose, Dear Matt, > >>>>> > >>>>> I needed some time to think about your answers. > >>>>> If I understand correctly, the eigenmode solver internally uses > A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP > solver and a corresponding preconditioner. > >>>>> What I would need is an interface to provide not only the system > Matrix A0 (which is an operator), but also a preconditioning matrix (sparse > approximation of the operator). > >>>>> Unfortunately this interface is not available, right? > >>>> > >>>> If SLEPc does not provide this directly it is still intended to be > trivial to provide the "preconditioner matrix" (that is matrix from which > the preconditioner is built). Just get the KSP from the ST object and use > KSPSetOperators() to provide the "preconditioner matrix" . > >>>> > >>>> Barry > >>>> > >>>>> > >>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The > operator uses a KSP with a proper PC internally. SLEPc would directly get > A0^{-1}*B0 and solve a standard eigenvalue problem with this modified > operator. Did I understand this correctly? > >>>>> > >>>>> I have two further points, which I did not mention yet: the matrix > B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am > using Firedrake to set up the PETSc system matrices A0, i*B0 (which is > real). Then I convert them into ScipyLinearOperators and use > scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. > Minv=A0^-1 is also solving within scipy using a preconditioned gmres. > Advantage of this setup is that the imaginary B0 can be handled efficiently > and also the post-processing of the eigenvectors (which requires complex > arithmetics) is simplified. > >>>>> > >>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too > complicated and is not very flexible. > >>>>> If I would use Matt's approach, could I then simply switch between > multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to > the use of matshell? > >>>>> Is there a solution for the imaginary B0, or do I have to use the > non-hermitian methods? Is this a large performance drawback? > >>>>> > >>>>> thanks again, > >>>>> and best wishes > >>>>> Florian > >>>>> > >>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman > wrote: > >>>>> The problem can be written as A0*v=omega*B0*v and you want the > eigenvalues omega closest to zero. If the matrices were explicitly > available, you would do shift-and-invert with target=0, that is > >>>>> > >>>>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is > >>>>> > >>>>> A0^{-1}*B0*v=theta*v > >>>>> > >>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. > >>>>> > >>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of > EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? > EPS_SMALLEST_REAL will give slow convergence. > >>>>> > >>>>> Florian: I would not recommend setting the KSP matrices directly, it > may produce strange side-effects. We should have an interface function to > pass this matrix. Currently there is STPrecondSetMatForPC() but it has two > problems: (1) it is intended for STPRECOND, so cannot be used with > Krylov-Schur, and (2) it is not currently available in the python interface. > >>>>> > >>>>> The approach used by Matt is a workaround that does not use ST, so > you can handle linear solves with a KSP of your own. > >>>>> > >>>>> As an alternative, since your problem is symmetric, you could try > LOBPCG, assuming that the leftmost eigenvalues are those that you want > (e.g. if all eigenvalues are non-negative). In that case you could use > STPrecondSetMatForPC(), but the remaining issue is calling it from python. > >>>>> > >>>>> If you are using the git repo, I could add the relevant code. > >>>>> > >>>>> Jose > >>>>> > >>>>> > >>>>> > >>>>> > El 8 feb 2021, a las 14:22, Matthew Knepley > escribi?: > >>>>> > > >>>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner < > e0425375 at gmail.com> wrote: > >>>>> > Dear PETSc / SLEPc Users, > >>>>> > > >>>>> > my question is very similar to the one posted here: > >>>>> > > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > >>>>> > > >>>>> > The eigensystem I would like to solve looks like: > >>>>> > B0 v = 1/omega A0 v > >>>>> > B0 and A0 are both hermitian, A0 is positive definite, but only > given as a linear operator (matshell). I am looking for the largest > eigenvalues (=smallest omega). > >>>>> > > >>>>> > I also have a sparse approximation P0 of the A0 operator, which i > would like to use as precondtioner, using something like this: > >>>>> > > >>>>> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > >>>>> > st = es.getST() > >>>>> > ksp = st.getKSP() > >>>>> > ksp.setOperators(self.A0, self.P0) > >>>>> > > >>>>> > Unfortunately PETSc still complains that it cannot create a > preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but > A0.type == 'python'). > >>>>> > By the way, should P0 be an approximation of A0 or does it have to > include B0? > >>>>> > > >>>>> > Right now I am using the krylov-schur method. Are there any > alternatives if A0 is only given as an operator? > >>>>> > > >>>>> > Jose can correct me if I say something wrong. > >>>>> > > >>>>> > When I did this, I made a shell operator for the action of A0^{-1} > B0 which has a KSPSolve() in it, so you can use your P0 preconditioning > matrix, and > >>>>> > then handed that to EPS. You can see me do it here: > >>>>> > > >>>>> > > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > >>>>> > > >>>>> > I had a hard time getting the embedded solver to work the way I > wanted, but maybe that is the better way. > >>>>> > > >>>>> > Thanks, > >>>>> > > >>>>> > Matt > >>>>> > > >>>>> > thanks for any advice > >>>>> > best wishes > >>>>> > Florian > >>>>> > > >>>>> > > >>>>> > -- > >>>>> > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>>> > -- Norbert Wiener > >>>>> > > >>>>> > https://www.cse.buffalo.edu/~knepley/ > >>>>> > >>>> > >>> > >> > >> > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Feb 15 11:44:15 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 15 Feb 2021 18:44:15 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev> <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es> Message-ID: <80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es> > El 15 feb 2021, a las 14:53, Matthew Knepley escribi?: > > On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman wrote: > I will think about the viability of adding an interface function to pass the preconditioner matrix. > > Regarding the question about the B-orthogonality of computed vectors, in the symmetric solver the B-orthogonality is enforced during the computation, so you have guarantee that the computed vectors satisfy it. But if solved as non-symetric, the computed vectors may depart from B-orthogonality, unless the tolerance is very small. > > Yes, the vectors I generate are not B-orthogonal. > > Jose, do you think there is a way to reformulate what I am doing to use the symmetric solver, even if we only have the action of B? Yes, you can do the following: ierr = EPSSetOperators(eps,S,NULL);CHKERRQ(ierr); // S is your shell matrix A^{-1}*B ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); // symmetric problem though S is not symmetric ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); ierr = EPSSetUp(eps);CHKERRQ(ierr); // note explicitly calling setup here ierr = EPSGetBV(eps,&bv);CHKERRQ(ierr); ierr = BVSetMatrix(bv,B,PETSC_FALSE);CHKERRQ(ierr); // replace solver's inner product ierr = EPSSolve(eps);CHKERRQ(ierr); I have tried this with test1.c and it works. The computed eigenvectors should be B-orthogonal in this case. Jose > > Thanks, > > Matt > > Jose > > > > El 14 feb 2021, a las 21:41, Barry Smith escribi?: > > > > > > Florian, > > > > I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). > > > > All I was saying is theoretically there could/should be such support in SLEPc. > > > > Barry > > > > > >> On Feb 13, 2021, at 6:43 PM, Florian Bruckner wrote: > >> > >> Dear Barry, > >> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator. > >> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator. > >> > >> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly? > >> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator. > >> > >> best wishes > >> Florian > >> > >> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith wrote: > >> > >> > >>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet wrote: > >>> > >>> > >>> > >>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner wrote: > >>>> > >>>> Dear Jose, Dear Barry, > >>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach? > >>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right? > >>>> > >>>> Regarding Barry's advice this is what I first tried: > >>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > >>>> st = es.getST() > >>>> ksp = st.getKSP() > >>>> ksp.setOperators(self.A0, self.P0) > >>>> > >>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right? > >>> > >>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1. > >> > >> Just trying to provide more clarity with the terms. > >> > >> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1 > >> > >> Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations". > >> > >> Barry > >> > >> > >>> > >>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement? > >>> > >>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though. > >>> > >>> Thanks, > >>> Pierre > >>> > >>>> best wishes > >>>> Florian > >>>> > >>>> > >>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith wrote: > >>>> > >>>> > >>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner wrote: > >>>>> > >>>>> Dear Jose, Dear Matt, > >>>>> > >>>>> I needed some time to think about your answers. > >>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. > >>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). > >>>>> Unfortunately this interface is not available, right? > >>>> > >>>> If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . > >>>> > >>>> Barry > >>>> > >>>>> > >>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? > >>>>> > >>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. > >>>>> > >>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. > >>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? > >>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? > >>>>> > >>>>> thanks again, > >>>>> and best wishes > >>>>> Florian > >>>>> > >>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman wrote: > >>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is > >>>>> > >>>>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is > >>>>> > >>>>> A0^{-1}*B0*v=theta*v > >>>>> > >>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. > >>>>> > >>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. > >>>>> > >>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. > >>>>> > >>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. > >>>>> > >>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. > >>>>> > >>>>> If you are using the git repo, I could add the relevant code. > >>>>> > >>>>> Jose > >>>>> > >>>>> > >>>>> > >>>>> > El 8 feb 2021, a las 14:22, Matthew Knepley escribi?: > >>>>> > > >>>>> > On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner wrote: > >>>>> > Dear PETSc / SLEPc Users, > >>>>> > > >>>>> > my question is very similar to the one posted here: > >>>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > >>>>> > > >>>>> > The eigensystem I would like to solve looks like: > >>>>> > B0 v = 1/omega A0 v > >>>>> > B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). > >>>>> > > >>>>> > I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: > >>>>> > > >>>>> > es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > >>>>> > st = es.getST() > >>>>> > ksp = st.getKSP() > >>>>> > ksp.setOperators(self.A0, self.P0) > >>>>> > > >>>>> > Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). > >>>>> > By the way, should P0 be an approximation of A0 or does it have to include B0? > >>>>> > > >>>>> > Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? > >>>>> > > >>>>> > Jose can correct me if I say something wrong. > >>>>> > > >>>>> > When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and > >>>>> > then handed that to EPS. You can see me do it here: > >>>>> > > >>>>> > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > >>>>> > > >>>>> > I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. > >>>>> > > >>>>> > Thanks, > >>>>> > > >>>>> > Matt > >>>>> > > >>>>> > thanks for any advice > >>>>> > best wishes > >>>>> > Florian > >>>>> > > >>>>> > > >>>>> > -- > >>>>> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>>>> > -- Norbert Wiener > >>>>> > > >>>>> > https://www.cse.buffalo.edu/~knepley/ > >>>>> > >>>> > >>> > >> > >> > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From swarnava89 at gmail.com Mon Feb 15 19:47:29 2021 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Mon, 15 Feb 2021 20:47:29 -0500 Subject: [petsc-users] makefile for building application with petsc Message-ID: Dear Petsc developers and users, I am having some issue with building my code with the following makefile. I was earlier able to build this with the same makefile on a different machine. Would you please help me out on this issue? Contents of makefile: ============================================== all:sparc CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/tools.cc SOURCEH = ./inc/sddft.h ./inc/isddft.h OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o LIBBASE = ./lib/sparc CLEANFILES = ./lib/sparc include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules sparc: ${OBJSC} chkopts ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} ${RM} $(SOURCECPP:%.cc=%.o) =========================================== Error: /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g -I/home/swarnava/petsc/include -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: cannot open source file "sddft.h" #include "sddft.h" ^ ==================================================== It's not able to see the header file though I have -I ./inc in CPPFLAGS. The directory containing makefile has the directory "inc" with the headers and "src" with the .cc files. Thank you, Swarnava -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Mon Feb 15 19:57:31 2021 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Mon, 15 Feb 2021 20:57:31 -0500 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: Hello, If possible, can you include a copy of your configure.log which you used to configure petsc? It will give useful information about your machine and compilers. What system did this work with successfully? Also please attach the makefile directly rather than including its contents as text, it is much easier to read. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On Feb 15, 2021, at 20:47, Swarnava Ghosh wrote: > > Dear Petsc developers and users, > > I am having some issue with building my code with the following makefile. I was earlier able to build this with the same makefile on a different machine. Would you please help me out on this issue? > > Contents of makefile: > ============================================== > all:sparc > > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/tools.cc > > SOURCEH = ./inc/sddft.h ./inc/isddft.h > > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o > > LIBBASE = ./lib/sparc > > CLEANFILES = ./lib/sparc > > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > > sparc: ${OBJSC} chkopts > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > ${RM} $(SOURCECPP:%.cc=%.o) > > =========================================== > Error: > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g -I/home/swarnava/petsc/include -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: cannot open source file "sddft.h" > #include "sddft.h" > ^ > ==================================================== > > It's not able to see the header file though I have -I ./inc in CPPFLAGS. The directory containing makefile has the directory "inc" with the headers and "src" with the .cc files. > > Thank you, > Swarnava > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Mon Feb 15 20:13:21 2021 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Mon, 15 Feb 2021 21:13:21 -0500 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: Hi Jacob, Attached is the configure.log and the makefile. It worked on a computing cluster earlier. The petsc and other necessary modules were built by system administrator on the cluster. I am trying to build the same code on a workstation. Sincerely, Swarnava On Mon, Feb 15, 2021 at 8:57 PM Jacob Faibussowitsch wrote: > Hello, > > If possible, can you include a copy of your configure.log which you used > to configure petsc? It will give useful information about your machine and > compilers. What system did this work with successfully? Also please attach > the makefile directly rather than including its contents as text, it is > much easier to read. > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: (312) 694-3391 > > On Feb 15, 2021, at 20:47, Swarnava Ghosh wrote: > > Dear Petsc developers and users, > > I am having some issue with building my code with the following makefile. > I was earlier able to build this with the same makefile on a different > machine. Would you please help me out on this issue? > > Contents of makefile: > ============================================== > all:sparc > > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/ > energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/ > poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc > ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/ > tools.cc > > SOURCEH = ./inc/sddft.h ./inc/isddft.h > > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o > ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o > ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o > ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o > > LIBBASE = ./lib/sparc > > CLEANFILES = ./lib/sparc > > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > > sparc: ${OBJSC} chkopts > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > ${RM} $(SOURCECPP:%.cc=%.o) > > =========================================== > Error: > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g > -I/home/swarnava/petsc/include > -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: > cannot open source file "sddft.h" > #include "sddft.h" > ^ > ==================================================== > > It's not able to see the header file though I have -I ./inc in CPPFLAGS. > The directory containing makefile has the directory "inc" with the headers > and "src" with the .cc files. > > Thank you, > Swarnava > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makefile Type: application/octet-stream Size: 986 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 3693065 bytes Desc: not available URL: From knepley at gmail.com Mon Feb 15 21:13:43 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 15 Feb 2021 22:13:43 -0500 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: On Mon, Feb 15, 2021 at 8:47 PM Swarnava Ghosh wrote: > Dear Petsc developers and users, > > I am having some issue with building my code with the following makefile. > I was earlier able to build this with the same makefile on a different > machine. Would you please help me out on this issue? > > Contents of makefile: > ============================================== > all:sparc > > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc > ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc > ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc > ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc > ./src/electrostatics.cc ./src/tools.cc > > SOURCEH = ./inc/sddft.h ./inc/isddft.h > > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o > ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o > ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o > ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o > > LIBBASE = ./lib/sparc > > CLEANFILES = ./lib/sparc > > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > > sparc: ${OBJSC} chkopts > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > ${RM} $(SOURCECPP:%.cc=%.o) > > =========================================== > Error: > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g > -I/home/swarnava/petsc/include > -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: > cannot open source file "sddft.h" > #include "sddft.h" > ^ > ==================================================== > > It's not able to see the header file though I have -I ./inc in CPPFLAGS. > The directory containing makefile has the directory "inc" with the headers > and "src" with the .cc files. > Some things have been reorganized with make. Can you try using CCPPFLAGS instead? Thanks, Matt > Thank you, > Swarnava > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Mon Feb 15 21:39:27 2021 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Mon, 15 Feb 2021 22:39:27 -0500 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: Hi Matthew, Tried CCPPFLAGS. It did not work. Sincerely, Swarnava On Mon, Feb 15, 2021 at 10:13 PM Matthew Knepley wrote: > On Mon, Feb 15, 2021 at 8:47 PM Swarnava Ghosh > wrote: > >> Dear Petsc developers and users, >> >> I am having some issue with building my code with the following makefile. >> I was earlier able to build this with the same makefile on a different >> machine. Would you please help me out on this issue? >> >> Contents of makefile: >> ============================================== >> all:sparc >> >> CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread >> >> SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc >> ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc >> ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc >> ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc >> ./src/electrostatics.cc ./src/tools.cc >> >> SOURCEH = ./inc/sddft.h ./inc/isddft.h >> >> OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o >> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o >> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o >> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o >> >> LIBBASE = ./lib/sparc >> >> CLEANFILES = ./lib/sparc >> >> include ${PETSC_DIR}/lib/petsc/conf/variables >> include ${PETSC_DIR}/lib/petsc/conf/rules >> >> sparc: ${OBJSC} chkopts >> ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} >> ${RM} $(SOURCECPP:%.cc=%.o) >> >> =========================================== >> Error: >> /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g >> -I/home/swarnava/petsc/include >> -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc >> /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: >> cannot open source file "sddft.h" >> #include "sddft.h" >> ^ >> ==================================================== >> >> It's not able to see the header file though I have -I ./inc in CPPFLAGS. >> The directory containing makefile has the directory "inc" with the headers >> and "src" with the .cc files. >> > > Some things have been reorganized with make. Can you try using CCPPFLAGS > instead? > > Thanks, > > Matt > > >> Thank you, >> Swarnava >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 15 21:50:33 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 15 Feb 2021 21:50:33 -0600 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: Swarnava, sddft.h is not a PETSc include file, nor is it used by PETSc so I think the issue is not directly to PETSc it is related to where sddft is on the machine and how it is found by your makefile. Barry > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh wrote: > > Dear Petsc developers and users, > > I am having some issue with building my code with the following makefile. I was earlier able to build this with the same makefile on a different machine. Would you please help me out on this issue? > > Contents of makefile: > ============================================== > all:sparc > > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/tools.cc > > SOURCEH = ./inc/sddft.h ./inc/isddft.h > > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o > > LIBBASE = ./lib/sparc > > CLEANFILES = ./lib/sparc > > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > > sparc: ${OBJSC} chkopts > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > ${RM} $(SOURCECPP:%.cc=%.o) > > =========================================== > Error: > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g -I/home/swarnava/petsc/include -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: cannot open source file "sddft.h" > #include "sddft.h" > ^ > ==================================================== > > It's not able to see the header file though I have -I ./inc in CPPFLAGS. The directory containing makefile has the directory "inc" with the headers and "src" with the .cc files. > > Thank you, > Swarnava > > From roland.richter at ntnu.no Tue Feb 16 02:43:01 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Tue, 16 Feb 2021 09:43:01 +0100 Subject: [petsc-users] Using distributed dense matrix/vector operations on a GPU Message-ID: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> Hei, after profiling my program using -log_view, I got the following output (all matrices are dense): /Using 8 OpenMP threads// //Using Petsc Development GIT revision: v3.14.3-583-g5464005aea? GIT Date: 2021-01-25 16:01:41 -0600// // //???????????????????????? Max?????? Max/Min???? Avg?????? Total// //Time (sec):?????????? 5.074e+03???? 1.000?? 5.074e+03// //Objects:????????????? 2.158e+03???? 1.000?? 2.158e+03// //Flop:???????????????? 5.236e+13???? 1.000?? 5.236e+13? 5.236e+13// //Flop/sec:???????????? 1.032e+10???? 1.000?? 1.032e+10? 1.032e+10// //MPI Messages:???????? 0.000e+00???? 0.000?? 0.000e+00? 0.000e+00// //MPI Message Lengths:? 0.000e+00???? 0.000?? 0.000e+00? 0.000e+00// //MPI Reductions:?????? 0.000e+00???? 0.000// // //Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)// //??????????????????????????? e.g., VecAXPY() for real vectors of length N --> 2N flop// //??????????????????????????? and VecAXPY() for complex vectors of length N --> 8N flop// // //Summary of Stages:?? ----- Time ------? ----- Flop ------? --- Messages ---? -- Message Lengths --? -- Reductions --// //??????????????????????? Avg???? %Total???? Avg???? %Total??? Count?? %Total???? Avg???????? %Total??? Count?? %Total// //?0:????? Main Stage: 5.0744e+03 100.0%? 5.2359e+13 100.0%? 0.000e+00?? 0.0%? 0.000e+00??????? 0.0%? 0.000e+00?? 0.0%// // //------------------------------------------------------------------------------------------------------------------------// //See the 'Profiling' chapter of the users' manual for details on interpreting output.// //Phase summary info:// //?? Count: number of times phase was executed// //?? Time and Flop: Max - maximum over all processors// //????????????????? Ratio - ratio of maximum to minimum over all processors// //?? Mess: number of messages sent// //?? AvgLen: average message length (bytes)// //?? Reduct: number of global reductions// //?? Global: entire computation// //?? Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().// //????? %T - percent time in this phase???????? %F - percent flop in this phase// //????? %M - percent messages in this phase???? %L - percent message lengths in this phase// //????? %R - percent reductions in this phase// //?? Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)// //?? GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)// //?? CpuToGpu Count: total number of CPU to GPU copies per processor// //?? CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)// //?? GpuToCpu Count: total number of GPU to CPU copies per processor// //?? GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)// //?? GPU %F: percent flops on GPU in this event// //------------------------------------------------------------------------------------------------------------------------// //Event??????????????? Count????? Time (sec)???? Flop????????????????????????????? --- Global ---? --- Stage ----? Total?? GPU??? - CpuToGpu -?? - GpuToCpu - GPU// //?????????????????? Max Ratio? Max???? Ratio?? Max? Ratio? Mess?? AvgLen? Reduct? %T %F %M %L %R? %T %F %M %L %R Mflop/s Mflop/s Count?? Size?? Count?? Size? %F// //---------------------------------------------------------------------------------------------------------------------------------------------------------------// // //--- Event Stage 0: Main Stage// // //VecSet??????????????? 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //VecAssemblyBegin????? 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //VecAssemblyEnd??????? 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatCopy??????????? 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 7? 0? 0? 0? 0?? 7? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatConvert????????? 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatScale?????????? 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 0.0e+00 0.0e+00 14? 3? 0? 0? 0? 14? 3? 0? 0? 0? 2303?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatAssemblyBegin?? 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatAssemblyEnd???? 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatZeroEntries???? 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 6? 0? 0? 0? 0?? 6? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatAXPY???????????? 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 0.0e+00 0.0e+00? 2? 1? 0? 0? 0?? 2? 1? 0? 0? 0? 4557?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatMatMultSym?????? 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //MatMatMultNum?????? 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 0.0e+00 0.0e+00? 8 96? 0? 0? 0?? 8 96? 0? 0? 0 123331?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// //---------------------------------------------------------------------------------------------------------------------------------------------------------------// // //Memory usage is given in bytes:// // //Object Type????????? Creations?? Destructions???? Memory? Descendants' Mem.// //Reports information only for process 0.// // //--- Event Stage 0: Main Stage// // //????????????? Vector??? 37???????????? 34????? 1634064???? 0.// //????????????? Matrix? 2120?????????? 2120? 52734663456???? 0.// //????????????? Viewer???? 1????????????? 0??????????? 0???? 0.// //========================================================================================================================/ Apparently, MatMatMultNum and MatScale take the most time (by far) during execution. Therefore, I was wondering if it is possible to move those operations/all matrices and vectors to a GPU or another accelerator. According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is only supported for distributed vectors, but not for dense distributed matrices. Are there any updates related to that, or other ways to speed up the involved operations? Thanks! Regards, Roland -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Feb 16 07:01:12 2021 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 16 Feb 2021 08:01:12 -0500 Subject: [petsc-users] Using distributed dense matrix/vector operations on a GPU In-Reply-To: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> Message-ID: You want to use -mat_type densecuda and -vec_type cuda You can see from the 0 in the last column that none of your work is done on the GPU. Mark On Tue, Feb 16, 2021 at 3:43 AM Roland Richter wrote: > Hei, > > after profiling my program using -log_view, I got the following output > (all matrices are dense): > > *Using 8 OpenMP threads* > *Using Petsc Development GIT revision: v3.14.3-583-g5464005aea GIT Date: > 2021-01-25 16:01:41 -0600* > > * Max Max/Min Avg Total* > *Time (sec): 5.074e+03 1.000 5.074e+03* > *Objects: 2.158e+03 1.000 2.158e+03* > *Flop: 5.236e+13 1.000 5.236e+13 5.236e+13* > *Flop/sec: 1.032e+10 1.000 1.032e+10 1.032e+10* > *MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00* > *MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00* > *MPI Reductions: 0.000e+00 0.000* > > *Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract)* > * e.g., VecAXPY() for real vectors of length N > --> 2N flop* > * and VecAXPY() for complex vectors of length N > --> 8N flop* > > *Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions --* > * Avg %Total Avg %Total Count > %Total Avg %Total Count %Total* > * 0: Main Stage: 5.0744e+03 100.0% 5.2359e+13 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0%* > > > *------------------------------------------------------------------------------------------------------------------------* > *See the 'Profiling' chapter of the users' manual for details on > interpreting output.* > *Phase summary info:* > * Count: number of times phase was executed* > * Time and Flop: Max - maximum over all processors* > * Ratio - ratio of maximum to minimum over all processors* > * Mess: number of messages sent* > * AvgLen: average message length (bytes)* > * Reduct: number of global reductions* > * Global: entire computation* > * Stage: stages of a computation. Set stages with PetscLogStagePush() > and PetscLogStagePop().* > * %T - percent time in this phase %F - percent flop in this > phase* > * %M - percent messages in this phase %L - percent message > lengths in this phase* > * %R - percent reductions in this phase* > * Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time > over all processors)* > * GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors)* > * CpuToGpu Count: total number of CPU to GPU copies per processor* > * CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor)* > * GpuToCpu Count: total number of GPU to CPU copies per processor* > * GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor)* > * GPU %F: percent flops on GPU in this event* > > *------------------------------------------------------------------------------------------------------------------------* > *Event Count Time (sec) > Flop --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU* > * Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size > Count Size %F* > > *---------------------------------------------------------------------------------------------------------------------------------------------------------------* > > *--- Event Stage 0: Main Stage* > > *VecSet 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *VecAssemblyBegin 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *VecAssemblyEnd 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatCopy 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatConvert 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatScale 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 0.0e+00 > 0.0e+00 14 3 0 0 0 14 3 0 0 0 2303 0 0 0.00e+00 0 > 0.00e+00 0* > *MatAssemblyBegin 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatAssemblyEnd 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatZeroEntries 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatAXPY 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 4557 0 0 0.00e+00 0 > 0.00e+00 0* > *MatMatMultSym 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatMatMultNum 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 0.0e+00 > 0.0e+00 8 96 0 0 0 8 96 0 0 0 123331 0 0 0.00e+00 0 > 0.00e+00 0* > > *---------------------------------------------------------------------------------------------------------------------------------------------------------------* > > *Memory usage is given in bytes:* > > *Object Type Creations Destructions Memory Descendants' > Mem.* > *Reports information only for process 0.* > > *--- Event Stage 0: Main Stage* > > * Vector 37 34 1634064 0.* > * Matrix 2120 2120 52734663456 0.* > * Viewer 1 0 0 0.* > > *========================================================================================================================* > > Apparently, MatMatMultNum and MatScale take the most time (by far) during > execution. Therefore, I was wondering if it is possible to move those > operations/all matrices and vectors to a GPU or another accelerator. > According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is > only supported for distributed vectors, but not for dense distributed > matrices. Are there any updates related to that, or other ways to speed up > the involved operations? > > Thanks! > > Regards, > > Roland > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Feb 16 07:14:32 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 16 Feb 2021 16:14:32 +0300 Subject: [petsc-users] Using distributed dense matrix/vector operations on a GPU In-Reply-To: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> Message-ID: Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter < roland.richter at ntnu.no> ha scritto: > Hei, > > after profiling my program using -log_view, I got the following output > (all matrices are dense): > > *Using 8 OpenMP threads* > *Using Petsc Development GIT revision: v3.14.3-583-g5464005aea GIT Date: > 2021-01-25 16:01:41 -0600* > > * Max Max/Min Avg Total* > *Time (sec): 5.074e+03 1.000 5.074e+03* > *Objects: 2.158e+03 1.000 2.158e+03* > *Flop: 5.236e+13 1.000 5.236e+13 5.236e+13* > *Flop/sec: 1.032e+10 1.000 1.032e+10 1.032e+10* > *MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00* > *MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00* > *MPI Reductions: 0.000e+00 0.000* > > *Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract)* > * e.g., VecAXPY() for real vectors of length N > --> 2N flop* > * and VecAXPY() for complex vectors of length N > --> 8N flop* > > *Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions --* > * Avg %Total Avg %Total Count > %Total Avg %Total Count %Total* > * 0: Main Stage: 5.0744e+03 100.0% 5.2359e+13 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0%* > > > *------------------------------------------------------------------------------------------------------------------------* > *See the 'Profiling' chapter of the users' manual for details on > interpreting output.* > *Phase summary info:* > * Count: number of times phase was executed* > * Time and Flop: Max - maximum over all processors* > * Ratio - ratio of maximum to minimum over all processors* > * Mess: number of messages sent* > * AvgLen: average message length (bytes)* > * Reduct: number of global reductions* > * Global: entire computation* > * Stage: stages of a computation. Set stages with PetscLogStagePush() > and PetscLogStagePop().* > * %T - percent time in this phase %F - percent flop in this > phase* > * %M - percent messages in this phase %L - percent message > lengths in this phase* > * %R - percent reductions in this phase* > * Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time > over all processors)* > * GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors)* > * CpuToGpu Count: total number of CPU to GPU copies per processor* > * CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor)* > * GpuToCpu Count: total number of GPU to CPU copies per processor* > * GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor)* > * GPU %F: percent flops on GPU in this event* > > *------------------------------------------------------------------------------------------------------------------------* > *Event Count Time (sec) > Flop --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU* > * Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size > Count Size %F* > > *---------------------------------------------------------------------------------------------------------------------------------------------------------------* > > *--- Event Stage 0: Main Stage* > > *VecSet 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *VecAssemblyBegin 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *VecAssemblyEnd 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatCopy 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatConvert 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatScale 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 0.0e+00 > 0.0e+00 14 3 0 0 0 14 3 0 0 0 2303 0 0 0.00e+00 0 > 0.00e+00 0* > *MatAssemblyBegin 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatAssemblyEnd 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatZeroEntries 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatAXPY 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 2 1 0 0 0 2 1 0 0 0 4557 0 0 0.00e+00 0 > 0.00e+00 0* > *MatMatMultSym 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0* > *MatMatMultNum 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 0.0e+00 > 0.0e+00 8 96 0 0 0 8 96 0 0 0 123331 0 0 0.00e+00 0 > 0.00e+00 0* > > *---------------------------------------------------------------------------------------------------------------------------------------------------------------* > > *Memory usage is given in bytes:* > > *Object Type Creations Destructions Memory Descendants' > Mem.* > *Reports information only for process 0.* > > *--- Event Stage 0: Main Stage* > > * Vector 37 34 1634064 0.* > * Matrix 2120 2120 52734663456 0.* > * Viewer 1 0 0 0.* > > *========================================================================================================================* > > Apparently, MatMatMultNum and MatScale take the most time (by far) during > execution. Therefore, I was wondering if it is possible to move those > operations/all matrices and vectors to a GPU or another accelerator. > According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is > only supported for distributed vectors, but not for dense distributed > matrices. Are there any updates related to that, or other ways to speed up > the involved operations? > > You should compute the timings associated with each call, and not consider the lump sum. For example, each MatScale takes 6.9348e+02/56162 = 0.012347851 seconds on average, I doubt you can get any reasonable speedup with CUDA. What are the sizes of these matrices? > Thanks! > > Regards, > > Roland > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.richter at ntnu.no Tue Feb 16 07:16:57 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Tue, 16 Feb 2021 14:16:57 +0100 Subject: [petsc-users] Using distributed dense matrix/vector operations on a GPU In-Reply-To: References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> Message-ID: Hei, the usual size of those matrices is (cumulative, not distributed) at least [8192x8192] x [8192x32768] complex entries as lower boundary. Does it still make sense to test CUDA for speedup? Thank you, regards, Roland Am 16.02.21 um 14:14 schrieb Stefano Zampini: > > > Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter > > ha scritto: > > Hei, > > after profiling my program using -log_view, I got the following > output (all matrices are dense): > > /Using 8 OpenMP threads// > //Using Petsc Development GIT revision: v3.14.3-583-g5464005aea? > GIT Date: 2021-01-25 16:01:41 -0600// > // > //???????????????????????? Max?????? Max/Min???? Avg?????? Total// > //Time (sec):?????????? 5.074e+03???? 1.000?? 5.074e+03// > //Objects:????????????? 2.158e+03???? 1.000?? 2.158e+03// > //Flop:???????????????? 5.236e+13???? 1.000?? 5.236e+13? 5.236e+13// > //Flop/sec:???????????? 1.032e+10???? 1.000?? 1.032e+10? 1.032e+10// > //MPI Messages:???????? 0.000e+00???? 0.000?? 0.000e+00? 0.000e+00// > //MPI Message Lengths:? 0.000e+00???? 0.000?? 0.000e+00? 0.000e+00// > //MPI Reductions:?????? 0.000e+00???? 0.000// > // > //Flop counting convention: 1 flop = 1 real number operation of > type (multiply/divide/add/subtract)// > //??????????????????????????? e.g., VecAXPY() for real vectors of > length N --> 2N flop// > //??????????????????????????? and VecAXPY() for complex vectors of > length N --> 8N flop// > // > //Summary of Stages:?? ----- Time ------? ----- Flop ------? --- > Messages ---? -- Message Lengths --? -- Reductions --// > //??????????????????????? Avg???? %Total???? Avg???? %Total??? > Count?? %Total???? Avg???????? %Total??? Count?? %Total// > //?0:????? Main Stage: 5.0744e+03 100.0%? 5.2359e+13 100.0%? > 0.000e+00?? 0.0%? 0.000e+00??????? 0.0%? 0.000e+00?? 0.0%// > // > //------------------------------------------------------------------------------------------------------------------------// > //See the 'Profiling' chapter of the users' manual for details on > interpreting output.// > //Phase summary info:// > //?? Count: number of times phase was executed// > //?? Time and Flop: Max - maximum over all processors// > //????????????????? Ratio - ratio of maximum to minimum over all > processors// > //?? Mess: number of messages sent// > //?? AvgLen: average message length (bytes)// > //?? Reduct: number of global reductions// > //?? Global: entire computation// > //?? Stage: stages of a computation. Set stages with > PetscLogStagePush() and PetscLogStagePop().// > //????? %T - percent time in this phase???????? %F - percent flop > in this phase// > //????? %M - percent messages in this phase???? %L - percent > message lengths in this phase// > //????? %R - percent reductions in this phase// > //?? Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max > time over all processors)// > //?? GPU Mflop/s: 10e-6 * (sum of flop on GPU over all > processors)/(max GPU time over all processors)// > //?? CpuToGpu Count: total number of CPU to GPU copies per processor// > //?? CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU > copies per processor)// > //?? GpuToCpu Count: total number of GPU to CPU copies per processor// > //?? GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU > copies per processor)// > //?? GPU %F: percent flops on GPU in this event// > //------------------------------------------------------------------------------------------------------------------------// > //Event??????????????? Count????? Time (sec)???? > Flop????????????????????????????? --- Global ---? --- Stage ----? > Total?? GPU??? - CpuToGpu -?? - GpuToCpu - GPU// > //?????????????????? Max Ratio? Max???? Ratio?? Max? Ratio? Mess?? > AvgLen? Reduct? %T %F %M %L %R? %T %F %M %L %R Mflop/s Mflop/s > Count?? Size?? Count?? Size? %F// > //---------------------------------------------------------------------------------------------------------------------------------------------------------------// > // > //--- Event Stage 0: Main Stage// > // > //VecSet??????????????? 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //VecAssemblyBegin????? 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //VecAssemblyEnd??????? 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatCopy??????????? 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 7? 0? 0? 0? 0?? 7? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatConvert????????? 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatScale?????????? 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 > 0.0e+00 0.0e+00 14? 3? 0? 0? 0? 14? 3? 0? 0? 0? 2303?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatAssemblyBegin?? 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatAssemblyEnd???? 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatZeroEntries???? 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 6? 0? 0? 0? 0?? 6? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatAXPY???????????? 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 > 0.0e+00 0.0e+00? 2? 1? 0? 0? 0?? 2? 1? 0? 0? 0? 4557?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatMatMultSym?????? 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? 0?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //MatMatMultNum?????? 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 > 0.0e+00 0.0e+00? 8 96? 0? 0? 0?? 8 96? 0? 0? 0 123331?????? 0????? > 0 0.00e+00??? 0 0.00e+00? 0// > //---------------------------------------------------------------------------------------------------------------------------------------------------------------// > // > //Memory usage is given in bytes:// > // > //Object Type????????? Creations?? Destructions???? Memory? > Descendants' Mem.// > //Reports information only for process 0.// > // > //--- Event Stage 0: Main Stage// > // > //????????????? Vector??? 37???????????? 34????? 1634064???? 0.// > //????????????? Matrix? 2120?????????? 2120? 52734663456???? 0.// > //????????????? Viewer???? 1????????????? 0??????????? 0???? 0.// > //========================================================================================================================/ > > Apparently, MatMatMultNum and MatScale take the most time (by far) > during execution. Therefore, I was wondering if it is possible to > move those operations/all matrices and vectors to a GPU or another > accelerator. According to > https://www.mcs.anl.gov/petsc/features/gpus.html > CUDA is only > supported for distributed vectors, but not for dense distributed > matrices. Are there any updates related to that, or other ways to > speed up the involved operations? > > > You should compute the timings associated with each call, and not > consider the lump sum. For example, each MatScale takes > 6.9348e+02/56162? = 0.012347851 seconds on average,? I doubt you can > get any reasonable speedup with CUDA. What are the sizes of these > matrices?? > ? > > Thanks! > > Regards, > > Roland > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Feb 16 07:25:53 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 16 Feb 2021 16:25:53 +0300 Subject: [petsc-users] Using distributed dense matrix/vector operations on a GPU In-Reply-To: References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> Message-ID: > > > > the usual size of those matrices is (cumulative, not distributed) at least > [8192x8192] x [8192x32768] complex entries as lower boundary. Does it still > make sense to test CUDA for speedup? > > I don't understand your notation. Are you saying your matrices are 8K x 8K? or 8K*32K? or what? > Thank you, > > regards, > > Roland > Am 16.02.21 um 14:14 schrieb Stefano Zampini: > > > > Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter < > roland.richter at ntnu.no> ha scritto: > >> Hei, >> >> after profiling my program using -log_view, I got the following output >> (all matrices are dense): >> >> *Using 8 OpenMP threads* >> *Using Petsc Development GIT revision: v3.14.3-583-g5464005aea GIT Date: >> 2021-01-25 16:01:41 -0600* >> >> * Max Max/Min Avg Total* >> *Time (sec): 5.074e+03 1.000 5.074e+03* >> *Objects: 2.158e+03 1.000 2.158e+03* >> *Flop: 5.236e+13 1.000 5.236e+13 5.236e+13* >> *Flop/sec: 1.032e+10 1.000 1.032e+10 1.032e+10* >> *MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00* >> *MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00* >> *MPI Reductions: 0.000e+00 0.000* >> >> *Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract)* >> * e.g., VecAXPY() for real vectors of length N >> --> 2N flop* >> * and VecAXPY() for complex vectors of length >> N --> 8N flop* >> >> *Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages >> --- -- Message Lengths -- -- Reductions --* >> * Avg %Total Avg %Total Count >> %Total Avg %Total Count %Total* >> * 0: Main Stage: 5.0744e+03 100.0% 5.2359e+13 100.0% 0.000e+00 >> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%* >> >> >> *------------------------------------------------------------------------------------------------------------------------* >> *See the 'Profiling' chapter of the users' manual for details on >> interpreting output.* >> *Phase summary info:* >> * Count: number of times phase was executed* >> * Time and Flop: Max - maximum over all processors* >> * Ratio - ratio of maximum to minimum over all >> processors* >> * Mess: number of messages sent* >> * AvgLen: average message length (bytes)* >> * Reduct: number of global reductions* >> * Global: entire computation* >> * Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop().* >> * %T - percent time in this phase %F - percent flop in this >> phase* >> * %M - percent messages in this phase %L - percent message >> lengths in this phase* >> * %R - percent reductions in this phase* >> * Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time >> over all processors)* >> * GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max >> GPU time over all processors)* >> * CpuToGpu Count: total number of CPU to GPU copies per processor* >> * CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per >> processor)* >> * GpuToCpu Count: total number of GPU to CPU copies per processor* >> * GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per >> processor)* >> * GPU %F: percent flops on GPU in this event* >> >> *------------------------------------------------------------------------------------------------------------------------* >> *Event Count Time (sec) >> Flop --- Global --- --- Stage ---- Total >> GPU - CpuToGpu - - GpuToCpu - GPU* >> * Max Ratio Max Ratio Max Ratio Mess AvgLen >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size >> Count Size %F* >> >> *---------------------------------------------------------------------------------------------------------------------------------------------------------------* >> >> *--- Event Stage 0: Main Stage* >> >> *VecSet 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *VecAssemblyBegin 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *VecAssemblyEnd 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatCopy 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatConvert 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatScale 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 0.0e+00 >> 0.0e+00 14 3 0 0 0 14 3 0 0 0 2303 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatAssemblyBegin 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatAssemblyEnd 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatZeroEntries 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatAXPY 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 4557 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatMatMultSym 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 >> 0.00e+00 0* >> *MatMatMultNum 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 0.0e+00 >> 0.0e+00 8 96 0 0 0 8 96 0 0 0 123331 0 0 0.00e+00 0 >> 0.00e+00 0* >> >> *---------------------------------------------------------------------------------------------------------------------------------------------------------------* >> >> *Memory usage is given in bytes:* >> >> *Object Type Creations Destructions Memory Descendants' >> Mem.* >> *Reports information only for process 0.* >> >> *--- Event Stage 0: Main Stage* >> >> * Vector 37 34 1634064 0.* >> * Matrix 2120 2120 52734663456 0.* >> * Viewer 1 0 0 0.* >> >> *========================================================================================================================* >> >> Apparently, MatMatMultNum and MatScale take the most time (by far) during >> execution. Therefore, I was wondering if it is possible to move those >> operations/all matrices and vectors to a GPU or another accelerator. >> According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is >> only supported for distributed vectors, but not for dense distributed >> matrices. Are there any updates related to that, or other ways to speed up >> the involved operations? >> > > You should compute the timings associated with each call, and not consider > the lump sum. For example, each MatScale takes 6.9348e+02/56162 = > 0.012347851 seconds on average, I doubt you can get any reasonable speedup > with CUDA. What are the sizes of these matrices? > > >> Thanks! >> >> Regards, >> >> Roland >> > > > -- > Stefano > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.richter at ntnu.no Tue Feb 16 07:30:06 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Tue, 16 Feb 2021 14:30:06 +0100 Subject: [petsc-users] Using distributed dense matrix/vector operations on a GPU In-Reply-To: References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> Message-ID: <7cb65ca8-1748-2e08-0a04-d61c21c6a40a@ntnu.no> For MatMatMult the size of the involved matrices is? 8k x 8k and 8k x 32k. I am not sure where MatScale is called, I never call it explicitly. If MatDiagonalScale calls MatScale, then the involved matrices have a size of 8k x 32k. Regards, Roland Am 16.02.21 um 14:25 schrieb Stefano Zampini: > > > ? > > the usual size of those matrices is (cumulative, not distributed) > at least [8192x8192] x [8192x32768] complex entries as lower > boundary. Does it still make sense to test CUDA for speedup? > > I don't understand your notation. Are you saying your matrices are 8K > x 8K? or 8K*32K? or what? > ? > > Thank you, > > regards, > > Roland > > Am 16.02.21 um 14:14 schrieb Stefano Zampini: >> >> >> Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter >> > ha scritto: >> >> Hei, >> >> after profiling my program using -log_view, I got the >> following output (all matrices are dense): >> >> /Using 8 OpenMP threads// >> //Using Petsc Development GIT revision: >> v3.14.3-583-g5464005aea? GIT Date: 2021-01-25 16:01:41 -0600// >> // >> //???????????????????????? Max?????? Max/Min???? Avg?????? >> Total// >> //Time (sec):?????????? 5.074e+03???? 1.000?? 5.074e+03// >> //Objects:????????????? 2.158e+03???? 1.000?? 2.158e+03// >> //Flop:???????????????? 5.236e+13???? 1.000?? 5.236e+13? >> 5.236e+13// >> //Flop/sec:???????????? 1.032e+10???? 1.000?? 1.032e+10? >> 1.032e+10// >> //MPI Messages:???????? 0.000e+00???? 0.000?? 0.000e+00? >> 0.000e+00// >> //MPI Message Lengths:? 0.000e+00???? 0.000?? 0.000e+00? >> 0.000e+00// >> //MPI Reductions:?????? 0.000e+00???? 0.000// >> // >> //Flop counting convention: 1 flop = 1 real number operation >> of type (multiply/divide/add/subtract)// >> //??????????????????????????? e.g., VecAXPY() for real >> vectors of length N --> 2N flop// >> //??????????????????????????? and VecAXPY() for complex >> vectors of length N --> 8N flop// >> // >> //Summary of Stages:?? ----- Time ------? ----- Flop ------? >> --- Messages ---? -- Message Lengths --? -- Reductions --// >> //??????????????????????? Avg???? %Total???? Avg???? >> %Total??? Count?? %Total???? Avg???????? %Total??? Count?? >> %Total// >> //?0:????? Main Stage: 5.0744e+03 100.0%? 5.2359e+13 100.0%? >> 0.000e+00?? 0.0%? 0.000e+00??????? 0.0%? 0.000e+00?? 0.0%// >> // >> //------------------------------------------------------------------------------------------------------------------------// >> //See the 'Profiling' chapter of the users' manual for >> details on interpreting output.// >> //Phase summary info:// >> //?? Count: number of times phase was executed// >> //?? Time and Flop: Max - maximum over all processors// >> //????????????????? Ratio - ratio of maximum to minimum over >> all processors// >> //?? Mess: number of messages sent// >> //?? AvgLen: average message length (bytes)// >> //?? Reduct: number of global reductions// >> //?? Global: entire computation// >> //?? Stage: stages of a computation. Set stages with >> PetscLogStagePush() and PetscLogStagePop().// >> //????? %T - percent time in this phase???????? %F - percent >> flop in this phase// >> //????? %M - percent messages in this phase???? %L - percent >> message lengths in this phase// >> //????? %R - percent reductions in this phase// >> //?? Total Mflop/s: 10e-6 * (sum of flop over all >> processors)/(max time over all processors)// >> //?? GPU Mflop/s: 10e-6 * (sum of flop on GPU over all >> processors)/(max GPU time over all processors)// >> //?? CpuToGpu Count: total number of CPU to GPU copies per >> processor// >> //?? CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to >> GPU copies per processor)// >> //?? GpuToCpu Count: total number of GPU to CPU copies per >> processor// >> //?? GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to >> CPU copies per processor)// >> //?? GPU %F: percent flops on GPU in this event// >> //------------------------------------------------------------------------------------------------------------------------// >> //Event??????????????? Count????? Time (sec)???? >> Flop????????????????????????????? --- Global ---? --- Stage >> ----? Total?? GPU??? - CpuToGpu -?? - GpuToCpu - GPU// >> //?????????????????? Max Ratio? Max???? Ratio?? Max? Ratio? >> Mess?? AvgLen? Reduct? %T %F %M %L %R? %T %F %M %L %R Mflop/s >> Mflop/s Count?? Size?? Count?? Size? %F// >> //---------------------------------------------------------------------------------------------------------------------------------------------------------------// >> // >> //--- Event Stage 0: Main Stage// >> // >> //VecSet??????????????? 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //VecAssemblyBegin????? 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //VecAssemblyEnd??????? 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatCopy??????????? 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 7? 0? 0? 0? 0?? 7? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatConvert????????? 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatScale?????????? 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 >> 0.0e+00 0.0e+00 0.0e+00 14? 3? 0? 0? 0? 14? 3? 0? 0? 0? >> 2303?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatAssemblyBegin?? 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatAssemblyEnd???? 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatZeroEntries???? 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 6? 0? 0? 0? 0?? 6? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatAXPY???????????? 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 >> 0.0e+00 0.0e+00 0.0e+00? 2? 1? 0? 0? 0?? 2? 1? 0? 0? 0? >> 4557?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatMatMultSym?????? 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 >> 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? 0???? >> 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //MatMatMultNum?????? 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 >> 0.0e+00 0.0e+00 0.0e+00? 8 96? 0? 0? 0?? 8 96? 0? 0? 0 >> 123331?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >> //---------------------------------------------------------------------------------------------------------------------------------------------------------------// >> // >> //Memory usage is given in bytes:// >> // >> //Object Type????????? Creations?? Destructions???? Memory? >> Descendants' Mem.// >> //Reports information only for process 0.// >> // >> //--- Event Stage 0: Main Stage// >> // >> //????????????? Vector??? 37???????????? 34????? 1634064???? 0.// >> //????????????? Matrix? 2120?????????? 2120? 52734663456???? 0.// >> //????????????? Viewer???? 1????????????? 0??????????? 0???? 0.// >> //========================================================================================================================/ >> >> Apparently, MatMatMultNum and MatScale take the most time (by >> far) during execution. Therefore, I was wondering if it is >> possible to move those operations/all matrices and vectors to >> a GPU or another accelerator. According to >> https://www.mcs.anl.gov/petsc/features/gpus.html >> CUDA is >> only supported for distributed vectors, but not for dense >> distributed matrices. Are there any updates related to that, >> or other ways to speed up the involved operations? >> >> >> You should compute the timings associated with each call, and not >> consider the lump sum. For example, each MatScale takes >> 6.9348e+02/56162? = 0.012347851 seconds on average,? I doubt you >> can get any reasonable speedup with CUDA. What are the sizes of >> these matrices?? >> ? >> >> Thanks! >> >> Regards, >> >> Roland >> >> >> >> -- >> Stefano > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 16 07:42:32 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 16 Feb 2021 08:42:32 -0500 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: On Mon, Feb 15, 2021 at 10:50 PM Barry Smith wrote: > Swarnava, > > sddft.h is not a PETSc include file, nor is it used by PETSc so I think > the issue is not directly to PETSc it is related to where sddft is on the > machine and how it is found by your makefile. > Barry, His problem is that he is trying to put extra include flags on the compile line, but it is not working. I am wondering if his make is malfunctioning. Thanks, Matt > Barry > > > > > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh > wrote: > > > > Dear Petsc developers and users, > > > > I am having some issue with building my code with the following > makefile. I was earlier able to build this with the same makefile on a > different machine. Would you please help me out on this issue? > > > > Contents of makefile: > > ============================================== > > all:sparc > > > > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > > > > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc > ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc > ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc > ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc > ./src/electrostatics.cc ./src/tools.cc > > > > SOURCEH = ./inc/sddft.h ./inc/isddft.h > > > > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o > ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o > ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o > ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o > > > > LIBBASE = ./lib/sparc > > > > CLEANFILES = ./lib/sparc > > > > include ${PETSC_DIR}/lib/petsc/conf/variables > > include ${PETSC_DIR}/lib/petsc/conf/rules > > > > sparc: ${OBJSC} chkopts > > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > > ${RM} $(SOURCECPP:%.cc=%.o) > > > > =========================================== > > Error: > > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g > -I/home/swarnava/petsc/include > -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: > cannot open source file "sddft.h" > > #include "sddft.h" > > ^ > > ==================================================== > > > > It's not able to see the header file though I have -I ./inc in > CPPFLAGS. The directory containing makefile has the directory "inc" with > the headers and "src" with the .cc files. > > > > Thank you, > > Swarnava > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue Feb 16 07:46:37 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 16 Feb 2021 16:46:37 +0300 Subject: [petsc-users] Using distributed dense matrix/vector operations on a GPU In-Reply-To: <7cb65ca8-1748-2e08-0a04-d61c21c6a40a@ntnu.no> References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> <7cb65ca8-1748-2e08-0a04-d61c21c6a40a@ntnu.no> Message-ID: Il giorno mar 16 feb 2021 alle ore 16:30 Roland Richter < roland.richter at ntnu.no> ha scritto: > For MatMatMult the size of the involved matrices is 8k x 8k and 8k x 32k. > Ok, so you have 32k columns to multiply against. Maybe you can get some speedup Howver, if you keep updating the matrix entries on CPU, then using CUDA will make little sense. In any case, you can try and see if you get any speedup > I am not sure where MatScale is called, I never call it explicitly. If > MatDiagonalScale calls MatScale, then the involved matrices have a size of > 8k x 32k. > No, it does not, Are you calling MatAYPX? > Regards, > > Roland > Am 16.02.21 um 14:25 schrieb Stefano Zampini: > > >> >> > the usual size of those matrices is (cumulative, not distributed) at least >> [8192x8192] x [8192x32768] complex entries as lower boundary. Does it still >> make sense to test CUDA for speedup? >> > I don't understand your notation. Are you saying your matrices are 8K x > 8K? or 8K*32K? or what? > > >> Thank you, >> >> regards, >> >> Roland >> Am 16.02.21 um 14:14 schrieb Stefano Zampini: >> >> >> >> Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter < >> roland.richter at ntnu.no> ha scritto: >> >>> Hei, >>> >>> after profiling my program using -log_view, I got the following output >>> (all matrices are dense): >>> >>> *Using 8 OpenMP threads* >>> *Using Petsc Development GIT revision: v3.14.3-583-g5464005aea GIT >>> Date: 2021-01-25 16:01:41 -0600* >>> >>> * Max Max/Min Avg Total* >>> *Time (sec): 5.074e+03 1.000 5.074e+03* >>> *Objects: 2.158e+03 1.000 2.158e+03* >>> *Flop: 5.236e+13 1.000 5.236e+13 5.236e+13* >>> *Flop/sec: 1.032e+10 1.000 1.032e+10 1.032e+10* >>> *MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00* >>> *MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00* >>> *MPI Reductions: 0.000e+00 0.000* >>> >>> *Flop counting convention: 1 flop = 1 real number operation of type >>> (multiply/divide/add/subtract)* >>> * e.g., VecAXPY() for real vectors of length >>> N --> 2N flop* >>> * and VecAXPY() for complex vectors of length >>> N --> 8N flop* >>> >>> *Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages >>> --- -- Message Lengths -- -- Reductions --* >>> * Avg %Total Avg %Total Count >>> %Total Avg %Total Count %Total* >>> * 0: Main Stage: 5.0744e+03 100.0% 5.2359e+13 100.0% 0.000e+00 >>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%* >>> >>> >>> *------------------------------------------------------------------------------------------------------------------------* >>> *See the 'Profiling' chapter of the users' manual for details on >>> interpreting output.* >>> *Phase summary info:* >>> * Count: number of times phase was executed* >>> * Time and Flop: Max - maximum over all processors* >>> * Ratio - ratio of maximum to minimum over all >>> processors* >>> * Mess: number of messages sent* >>> * AvgLen: average message length (bytes)* >>> * Reduct: number of global reductions* >>> * Global: entire computation* >>> * Stage: stages of a computation. Set stages with PetscLogStagePush() >>> and PetscLogStagePop().* >>> * %T - percent time in this phase %F - percent flop in this >>> phase* >>> * %M - percent messages in this phase %L - percent message >>> lengths in this phase* >>> * %R - percent reductions in this phase* >>> * Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time >>> over all processors)* >>> * GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max >>> GPU time over all processors)* >>> * CpuToGpu Count: total number of CPU to GPU copies per processor* >>> * CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per >>> processor)* >>> * GpuToCpu Count: total number of GPU to CPU copies per processor* >>> * GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per >>> processor)* >>> * GPU %F: percent flops on GPU in this event* >>> >>> *------------------------------------------------------------------------------------------------------------------------* >>> *Event Count Time (sec) >>> Flop --- Global --- --- Stage ---- Total >>> GPU - CpuToGpu - - GpuToCpu - GPU* >>> * Max Ratio Max Ratio Max Ratio Mess >>> AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count >>> Size Count Size %F* >>> >>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------* >>> >>> *--- Event Stage 0: Main Stage* >>> >>> *VecSet 37 1.0 1.0354e-04 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *VecAssemblyBegin 31 1.0 2.9080e-06 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *VecAssemblyEnd 31 1.0 2.3270e-06 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatCopy 49928 1.0 3.7437e+02 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatConvert 2080 1.0 5.8492e+00 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatScale 56162 1.0 6.9348e+02 1.0 1.60e+12 1.0 0.0e+00 >>> 0.0e+00 0.0e+00 14 3 0 0 0 14 3 0 0 0 2303 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatAssemblyBegin 56222 1.0 1.7370e-02 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatAssemblyEnd 56222 1.0 8.8713e-03 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatZeroEntries 60363 1.0 3.1011e+02 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatAXPY 8320 1.0 1.2254e+02 1.0 5.58e+11 1.0 0.0e+00 >>> 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 4557 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatMatMultSym 4161 1.0 7.1613e-03 1.0 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> *MatMatMultNum 4161 1.0 4.0706e+02 1.0 5.02e+13 1.0 0.0e+00 >>> 0.0e+00 0.0e+00 8 96 0 0 0 8 96 0 0 0 123331 0 0 >>> 0.00e+00 0 0.00e+00 0* >>> >>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------* >>> >>> *Memory usage is given in bytes:* >>> >>> *Object Type Creations Destructions Memory Descendants' >>> Mem.* >>> *Reports information only for process 0.* >>> >>> *--- Event Stage 0: Main Stage* >>> >>> * Vector 37 34 1634064 0.* >>> * Matrix 2120 2120 52734663456 0.* >>> * Viewer 1 0 0 0.* >>> >>> *========================================================================================================================* >>> >>> Apparently, MatMatMultNum and MatScale take the most time (by far) >>> during execution. Therefore, I was wondering if it is possible to move >>> those operations/all matrices and vectors to a GPU or another accelerator. >>> According to https://www.mcs.anl.gov/petsc/features/gpus.html CUDA is >>> only supported for distributed vectors, but not for dense distributed >>> matrices. Are there any updates related to that, or other ways to speed up >>> the involved operations? >>> >> >> You should compute the timings associated with each call, and not >> consider the lump sum. For example, each MatScale takes 6.9348e+02/56162 = >> 0.012347851 seconds on average, I doubt you can get any reasonable speedup >> with CUDA. What are the sizes of these matrices? >> >> >>> Thanks! >>> >>> Regards, >>> >>> Roland >>> >> >> >> -- >> Stefano >> >> > > -- > Stefano > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.richter at ntnu.no Tue Feb 16 07:55:51 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Tue, 16 Feb 2021 14:55:51 +0100 Subject: [petsc-users] Using distributed dense matrix/vector operations on a GPU In-Reply-To: References: <0a3a3aa5-f3a1-bbe3-55ae-ec5db6aeb892@ntnu.no> <7cb65ca8-1748-2e08-0a04-d61c21c6a40a@ntnu.no> Message-ID: <0acdbac2-c311-3ff0-664c-6be3ce9d885a@ntnu.no> Yes, I call MatAXPY, but the matrix size stays the same. Regards, Roland Am 16.02.21 um 14:46 schrieb Stefano Zampini: > > Il giorno mar 16 feb 2021 alle ore 16:30 Roland Richter > > ha scritto: > > For MatMatMult the size of the involved matrices is? 8k x 8k and > 8k x 32k. > > Ok, so you have 32k columns to multiply against. Maybe you can get > some speedup > Howver, if you keep updating the matrix entries on CPU, then using > CUDA will make little sense. > In any case, you can try and see if you get any speedup? > > I am not sure where MatScale is called, I never call it > explicitly. If MatDiagonalScale calls MatScale, then the involved > matrices have a size of 8k x 32k. > > No, it does not, Are you calling MatAYPX?? > > ? > > Regards, > > Roland > > Am 16.02.21 um 14:25 schrieb Stefano Zampini: >> >> >> ? >> >> the usual size of those matrices is (cumulative, not >> distributed) at least [8192x8192] x [8192x32768] complex >> entries as lower boundary. Does it still make sense to test >> CUDA for speedup? >> >> I don't understand your notation. Are you saying your matrices >> are 8K x 8K? or 8K*32K? or what? >> ? >> >> Thank you, >> >> regards, >> >> Roland >> >> Am 16.02.21 um 14:14 schrieb Stefano Zampini: >>> >>> >>> Il giorno mar 16 feb 2021 alle ore 11:43 Roland Richter >>> > ha >>> scritto: >>> >>> Hei, >>> >>> after profiling my program using -log_view, I got the >>> following output (all matrices are dense): >>> >>> /Using 8 OpenMP threads// >>> //Using Petsc Development GIT revision: >>> v3.14.3-583-g5464005aea? GIT Date: 2021-01-25 16:01:41 >>> -0600// >>> // >>> //???????????????????????? Max?????? Max/Min???? >>> Avg?????? Total// >>> //Time (sec):?????????? 5.074e+03???? 1.000?? 5.074e+03// >>> //Objects:????????????? 2.158e+03???? 1.000?? 2.158e+03// >>> //Flop:???????????????? 5.236e+13???? 1.000?? 5.236e+13? >>> 5.236e+13// >>> //Flop/sec:???????????? 1.032e+10???? 1.000?? 1.032e+10? >>> 1.032e+10// >>> //MPI Messages:???????? 0.000e+00???? 0.000?? 0.000e+00? >>> 0.000e+00// >>> //MPI Message Lengths:? 0.000e+00???? 0.000?? 0.000e+00? >>> 0.000e+00// >>> //MPI Reductions:?????? 0.000e+00???? 0.000// >>> // >>> //Flop counting convention: 1 flop = 1 real number >>> operation of type (multiply/divide/add/subtract)// >>> //??????????????????????????? e.g., VecAXPY() for real >>> vectors of length N --> 2N flop// >>> //??????????????????????????? and VecAXPY() for complex >>> vectors of length N --> 8N flop// >>> // >>> //Summary of Stages:?? ----- Time ------? ----- Flop >>> ------? --- Messages ---? -- Message Lengths --? -- >>> Reductions --// >>> //??????????????????????? Avg???? %Total???? Avg???? >>> %Total??? Count?? %Total???? Avg???????? %Total??? >>> Count?? %Total// >>> //?0:????? Main Stage: 5.0744e+03 100.0%? 5.2359e+13 >>> 100.0%? 0.000e+00?? 0.0%? 0.000e+00??????? 0.0%? >>> 0.000e+00?? 0.0%// >>> // >>> //------------------------------------------------------------------------------------------------------------------------// >>> //See the 'Profiling' chapter of the users' manual for >>> details on interpreting output.// >>> //Phase summary info:// >>> //?? Count: number of times phase was executed// >>> //?? Time and Flop: Max - maximum over all processors// >>> //????????????????? Ratio - ratio of maximum to minimum >>> over all processors// >>> //?? Mess: number of messages sent// >>> //?? AvgLen: average message length (bytes)// >>> //?? Reduct: number of global reductions// >>> //?? Global: entire computation// >>> //?? Stage: stages of a computation. Set stages with >>> PetscLogStagePush() and PetscLogStagePop().// >>> //????? %T - percent time in this phase???????? %F - >>> percent flop in this phase// >>> //????? %M - percent messages in this phase???? %L - >>> percent message lengths in this phase// >>> //????? %R - percent reductions in this phase// >>> //?? Total Mflop/s: 10e-6 * (sum of flop over all >>> processors)/(max time over all processors)// >>> //?? GPU Mflop/s: 10e-6 * (sum of flop on GPU over all >>> processors)/(max GPU time over all processors)// >>> //?? CpuToGpu Count: total number of CPU to GPU copies >>> per processor// >>> //?? CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU >>> to GPU copies per processor)// >>> //?? GpuToCpu Count: total number of GPU to CPU copies >>> per processor// >>> //?? GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU >>> to CPU copies per processor)// >>> //?? GPU %F: percent flops on GPU in this event// >>> //------------------------------------------------------------------------------------------------------------------------// >>> //Event??????????????? Count????? Time (sec)???? >>> Flop????????????????????????????? --- Global ---? --- >>> Stage ----? Total?? GPU??? - CpuToGpu -?? - GpuToCpu - GPU// >>> //?????????????????? Max Ratio? Max???? Ratio?? Max? >>> Ratio? Mess?? AvgLen? Reduct? %T %F %M %L %R? %T %F %M >>> %L %R Mflop/s Mflop/s Count?? Size?? Count?? Size? %F// >>> //---------------------------------------------------------------------------------------------------------------------------------------------------------------// >>> // >>> //--- Event Stage 0: Main Stage// >>> // >>> //VecSet??????????????? 37 1.0 1.0354e-04 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //VecAssemblyBegin????? 31 1.0 2.9080e-06 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //VecAssemblyEnd??????? 31 1.0 2.3270e-06 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatCopy??????????? 49928 1.0 3.7437e+02 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 7? 0? 0? 0? 0?? 7? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatConvert????????? 2080 1.0 5.8492e+00 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatScale?????????? 56162 1.0 6.9348e+02 1.0 1.60e+12 >>> 1.0 0.0e+00 0.0e+00 0.0e+00 14? 3? 0? 0? 0? 14? 3? 0? 0? >>> 0? 2303?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatAssemblyBegin?? 56222 1.0 1.7370e-02 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatAssemblyEnd???? 56222 1.0 8.8713e-03 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatZeroEntries???? 60363 1.0 3.1011e+02 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 6? 0? 0? 0? 0?? 6? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatAXPY???????????? 8320 1.0 1.2254e+02 1.0 5.58e+11 >>> 1.0 0.0e+00 0.0e+00 0.0e+00? 2? 1? 0? 0? 0?? 2? 1? 0? 0? >>> 0? 4557?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatMatMultSym?????? 4161 1.0 7.1613e-03 1.0 0.00e+00 >>> 0.0 0.0e+00 0.0e+00 0.0e+00? 0? 0? 0? 0? 0?? 0? 0? 0? 0? >>> 0???? 0?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //MatMatMultNum?????? 4161 1.0 4.0706e+02 1.0 5.02e+13 >>> 1.0 0.0e+00 0.0e+00 0.0e+00? 8 96? 0? 0? 0?? 8 96? 0? 0? >>> 0 123331?????? 0????? 0 0.00e+00??? 0 0.00e+00? 0// >>> //---------------------------------------------------------------------------------------------------------------------------------------------------------------// >>> // >>> //Memory usage is given in bytes:// >>> // >>> //Object Type????????? Creations?? Destructions???? >>> Memory? Descendants' Mem.// >>> //Reports information only for process 0.// >>> // >>> //--- Event Stage 0: Main Stage// >>> // >>> //????????????? Vector??? 37???????????? 34????? >>> 1634064???? 0.// >>> //????????????? Matrix? 2120?????????? 2120? >>> 52734663456???? 0.// >>> //????????????? Viewer???? 1????????????? 0??????????? >>> 0???? 0.// >>> //========================================================================================================================/ >>> >>> Apparently, MatMatMultNum and MatScale take the most >>> time (by far) during execution. Therefore, I was >>> wondering if it is possible to move those operations/all >>> matrices and vectors to a GPU or another accelerator. >>> According to >>> https://www.mcs.anl.gov/petsc/features/gpus.html >>> CUDA >>> is only supported for distributed vectors, but not for >>> dense distributed matrices. Are there any updates >>> related to that, or other ways to speed up the involved >>> operations? >>> >>> >>> You should compute the timings associated with each call, >>> and not consider the lump sum. For example, each MatScale >>> takes 6.9348e+02/56162? = 0.012347851 seconds on average,? I >>> doubt you can get any reasonable speedup with CUDA. What are >>> the sizes of these matrices?? >>> ? >>> >>> Thanks! >>> >>> Regards, >>> >>> Roland >>> >>> >>> >>> -- >>> Stefano >> >> >> >> -- >> Stefano > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Tue Feb 16 08:39:05 2021 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Tue, 16 Feb 2021 09:39:05 -0500 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: Swarnava, Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may explicitly declare a %.o: %.cc target and force it to include your CPPFLAGS. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On Feb 16, 2021, at 08:42, Matthew Knepley wrote: > > On Mon, Feb 15, 2021 at 10:50 PM Barry Smith > wrote: > Swarnava, > > sddft.h is not a PETSc include file, nor is it used by PETSc so I think the issue is not directly to PETSc it is related to where sddft is on the machine and how it is found by your makefile. > > Barry, > > His problem is that he is trying to put extra include flags on the compile line, but it is not working. I am wondering if his make is malfunctioning. > > Thanks, > > Matt > > Barry > > > > > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh > wrote: > > > > Dear Petsc developers and users, > > > > I am having some issue with building my code with the following makefile. I was earlier able to build this with the same makefile on a different machine. Would you please help me out on this issue? > > > > Contents of makefile: > > ============================================== > > all:sparc > > > > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > > > > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc ./src/tools.cc > > > > SOURCEH = ./inc/sddft.h ./inc/isddft.h > > > > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o > > > > LIBBASE = ./lib/sparc > > > > CLEANFILES = ./lib/sparc > > > > include ${PETSC_DIR}/lib/petsc/conf/variables > > include ${PETSC_DIR}/lib/petsc/conf/rules > > > > sparc: ${OBJSC} chkopts > > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > > ${RM} $(SOURCECPP:%.cc=%.o) > > > > =========================================== > > Error: > > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g -I/home/swarnava/petsc/include -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic error: cannot open source file "sddft.h" > > #include "sddft.h" > > ^ > > ==================================================== > > > > It's not able to see the header file though I have -I ./inc in CPPFLAGS. The directory containing makefile has the directory "inc" with the headers and "src" with the .cc files. > > > > Thank you, > > Swarnava > > > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 16 08:40:23 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 16 Feb 2021 09:40:23 -0500 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: On Tue, Feb 16, 2021 at 9:39 AM Jacob Faibussowitsch wrote: > Swarnava, > > Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may > explicitly declare a %.o: %.cc target and force it to include your CPPFLAGS. > No, do not do either of these things. We just need to figure out why it is not working for you. I will make a small example as soon as possible. Thanks, Matt > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: (312) 694-3391 > > On Feb 16, 2021, at 08:42, Matthew Knepley wrote: > > On Mon, Feb 15, 2021 at 10:50 PM Barry Smith wrote: > >> Swarnava, >> >> sddft.h is not a PETSc include file, nor is it used by PETSc so I think >> the issue is not directly to PETSc it is related to where sddft is on the >> machine and how it is found by your makefile. >> > > Barry, > > His problem is that he is trying to put extra include flags on the compile > line, but it is not working. I am wondering if his make is malfunctioning. > > Thanks, > > Matt > > >> Barry >> >> >> >> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh >> wrote: >> > >> > Dear Petsc developers and users, >> > >> > I am having some issue with building my code with the following >> makefile. I was earlier able to build this with the same makefile on a >> different machine. Would you please help me out on this issue? >> > >> > Contents of makefile: >> > ============================================== >> > all:sparc >> > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread >> > >> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/ >> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/ >> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/ >> forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc >> ./src/tools.cc >> > >> > SOURCEH = ./inc/sddft.h ./inc/isddft.h >> > >> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o >> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o >> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o >> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o >> > >> > LIBBASE = ./lib/sparc >> > >> > CLEANFILES = ./lib/sparc >> > >> > include ${PETSC_DIR}/lib/petsc/conf/variables >> > include ${PETSC_DIR}/lib/petsc/conf/rules >> > >> > sparc: ${OBJSC} chkopts >> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} >> > ${RM} $(SOURCECPP:%.cc=%.o) >> > >> > =========================================== >> > Error: >> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g >> -I/home/swarnava/petsc/include >> -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc >> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic >> error: cannot open source file "sddft.h" >> > #include "sddft.h" >> > ^ >> > ==================================================== >> > >> > It's not able to see the header file though I have -I ./inc in >> CPPFLAGS. The directory containing makefile has the directory "inc" with >> the headers and "src" with the .cc files. >> > >> > Thank you, >> > Swarnava >> > >> > >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Feb 16 11:09:43 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 16 Feb 2021 11:09:43 -0600 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: References: Message-ID: <6b1bfd4-7d3-ed49-eab1-045e790bfb6@mcs.anl.gov> for CXX - its CXXPPFLAGS > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread The link options shouldn't go into preprocessor flags. Also duplicate blas with petsc might cause grief. Best to build petsc with mkl. Satish On Tue, 16 Feb 2021, Matthew Knepley wrote: > On Tue, Feb 16, 2021 at 9:39 AM Jacob Faibussowitsch > wrote: > > > Swarnava, > > > > Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may > > explicitly declare a %.o: %.cc target and force it to include your CPPFLAGS. > > > > No, do not do either of these things. We just need to figure out why it is > not working for you. I will make a small example as soon as possible. > > Thanks, > > Matt > > > > Best regards, > > > > Jacob Faibussowitsch > > (Jacob Fai - booss - oh - vitch) > > Cell: (312) 694-3391 > > > > On Feb 16, 2021, at 08:42, Matthew Knepley wrote: > > > > On Mon, Feb 15, 2021 at 10:50 PM Barry Smith wrote: > > > >> Swarnava, > >> > >> sddft.h is not a PETSc include file, nor is it used by PETSc so I think > >> the issue is not directly to PETSc it is related to where sddft is on the > >> machine and how it is found by your makefile. > >> > > > > Barry, > > > > His problem is that he is trying to put extra include flags on the compile > > line, but it is not working. I am wondering if his make is malfunctioning. > > > > Thanks, > > > > Matt > > > > > >> Barry > >> > >> > >> > >> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh > >> wrote: > >> > > >> > Dear Petsc developers and users, > >> > > >> > I am having some issue with building my code with the following > >> makefile. I was earlier able to build this with the same makefile on a > >> different machine. Would you please help me out on this issue? > >> > > >> > Contents of makefile: > >> > ============================================== > >> > all:sparc > >> > > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > >> > > >> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/ > >> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/ > >> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/ > >> forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc > >> ./src/tools.cc > >> > > >> > SOURCEH = ./inc/sddft.h ./inc/isddft.h > >> > > >> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o > >> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o > >> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o > >> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o > >> > > >> > LIBBASE = ./lib/sparc > >> > > >> > CLEANFILES = ./lib/sparc > >> > > >> > include ${PETSC_DIR}/lib/petsc/conf/variables > >> > include ${PETSC_DIR}/lib/petsc/conf/rules > >> > > >> > sparc: ${OBJSC} chkopts > >> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > >> > ${RM} $(SOURCECPP:%.cc=%.o) > >> > > >> > =========================================== > >> > Error: > >> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g > >> -I/home/swarnava/petsc/include > >> -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > >> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic > >> error: cannot open source file "sddft.h" > >> > #include "sddft.h" > >> > ^ > >> > ==================================================== > >> > > >> > It's not able to see the header file though I have -I ./inc in > >> CPPFLAGS. The directory containing makefile has the directory "inc" with > >> the headers and "src" with the .cc files. > >> > > >> > Thank you, > >> > Swarnava > >> > > >> > > >> > >> > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > From balay at mcs.anl.gov Tue Feb 16 12:07:35 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 16 Feb 2021 12:07:35 -0600 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: <6b1bfd4-7d3-ed49-eab1-045e790bfb6@mcs.anl.gov> References: <6b1bfd4-7d3-ed49-eab1-045e790bfb6@mcs.anl.gov> Message-ID: <4ff6c811-2823-79d8-1cc-cfe857e8a24c@mcs.anl.gov> BTW: Here is a simple makefile for multiple sources. >>>> balay at sb /home/balay/tmp/prj/src $ ls main.cxx makefile sub.cxx balay at sb /home/balay/tmp/prj/src $ ls ../inc/ mainc.h balay at sb /home/balay/tmp/prj/src $ cat makefile all: main CXXPPFLAGS = -I../inc LDLIBS = -lmv include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules include ${PETSC_DIR}/lib/petsc/conf/test main: sub.o <<<<< We don't have a simple makefile for the use case where src, obj, binaries are in different locations. gmakefile.test has some code for it [this requires replacing 'lib/petsc/conf/test' with custom compile targets - as in gmakefile.test]. [and what you have appears to work for this usecase.] Satish On Tue, 16 Feb 2021, Satish Balay via petsc-users wrote: > for CXX - its CXXPPFLAGS > > > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > > The link options shouldn't go into preprocessor flags. > > Also duplicate blas with petsc might cause grief. Best to build petsc with mkl. > > > Satish > > On Tue, 16 Feb 2021, Matthew Knepley wrote: > > > On Tue, Feb 16, 2021 at 9:39 AM Jacob Faibussowitsch > > wrote: > > > > > Swarnava, > > > > > > Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may > > > explicitly declare a %.o: %.cc target and force it to include your CPPFLAGS. > > > > > > > No, do not do either of these things. We just need to figure out why it is > > not working for you. I will make a small example as soon as possible. > > > > Thanks, > > > > Matt > > > > > > > Best regards, > > > > > > Jacob Faibussowitsch > > > (Jacob Fai - booss - oh - vitch) > > > Cell: (312) 694-3391 > > > > > > On Feb 16, 2021, at 08:42, Matthew Knepley wrote: > > > > > > On Mon, Feb 15, 2021 at 10:50 PM Barry Smith wrote: > > > > > >> Swarnava, > > >> > > >> sddft.h is not a PETSc include file, nor is it used by PETSc so I think > > >> the issue is not directly to PETSc it is related to where sddft is on the > > >> machine and how it is found by your makefile. > > >> > > > > > > Barry, > > > > > > His problem is that he is trying to put extra include flags on the compile > > > line, but it is not working. I am wondering if his make is malfunctioning. > > > > > > Thanks, > > > > > > Matt > > > > > > > > >> Barry > > >> > > >> > > >> > > >> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh > > >> wrote: > > >> > > > >> > Dear Petsc developers and users, > > >> > > > >> > I am having some issue with building my code with the following > > >> makefile. I was earlier able to build this with the same makefile on a > > >> different machine. Would you please help me out on this issue? > > >> > > > >> > Contents of makefile: > > >> > ============================================== > > >> > all:sparc > > >> > > > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread > > >> > > > >> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc ./src/ > > >> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/ > > >> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/ > > >> forces.cc ./src/relaxatoms.cc ./src/multipole.cc ./src/electrostatics.cc > > >> ./src/tools.cc > > >> > > > >> > SOURCEH = ./inc/sddft.h ./inc/isddft.h > > >> > > > >> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o ./src/energy.o > > >> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o > > >> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o > > >> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o ./src/tools.o > > >> > > > >> > LIBBASE = ./lib/sparc > > >> > > > >> > CLEANFILES = ./lib/sparc > > >> > > > >> > include ${PETSC_DIR}/lib/petsc/conf/variables > > >> > include ${PETSC_DIR}/lib/petsc/conf/rules > > >> > > > >> > sparc: ${OBJSC} chkopts > > >> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > > >> > ${RM} $(SOURCECPP:%.cc=%.o) > > >> > > > >> > =========================================== > > >> > Error: > > >> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c -g > > >> -I/home/swarnava/petsc/include > > >> -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > > >> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic > > >> error: cannot open source file "sddft.h" > > >> > #include "sddft.h" > > >> > ^ > > >> > ==================================================== > > >> > > > >> > It's not able to see the header file though I have -I ./inc in > > >> CPPFLAGS. The directory containing makefile has the directory "inc" with > > >> the headers and "src" with the .cc files. > > >> > > > >> > Thank you, > > >> > Swarnava > > >> > > > >> > > > >> > > >> > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which their > > > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > > From jroman at dsic.upv.es Tue Feb 16 13:54:26 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 16 Feb 2021 20:54:26 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: <80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es> References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev> <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es> <80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es> Message-ID: Florian: I have created a MR https://gitlab.com/slepc/slepc/-/merge_requests/149 Let me know if it fits your needs. Jose > El 15 feb 2021, a las 18:44, Jose E. Roman escribi?: > > > >> El 15 feb 2021, a las 14:53, Matthew Knepley escribi?: >> >> On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman wrote: >> I will think about the viability of adding an interface function to pass the preconditioner matrix. >> >> Regarding the question about the B-orthogonality of computed vectors, in the symmetric solver the B-orthogonality is enforced during the computation, so you have guarantee that the computed vectors satisfy it. But if solved as non-symetric, the computed vectors may depart from B-orthogonality, unless the tolerance is very small. >> >> Yes, the vectors I generate are not B-orthogonal. >> >> Jose, do you think there is a way to reformulate what I am doing to use the symmetric solver, even if we only have the action of B? > > Yes, you can do the following: > > ierr = EPSSetOperators(eps,S,NULL);CHKERRQ(ierr); // S is your shell matrix A^{-1}*B > ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); // symmetric problem though S is not symmetric > ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); > ierr = EPSSetUp(eps);CHKERRQ(ierr); // note explicitly calling setup here > ierr = EPSGetBV(eps,&bv);CHKERRQ(ierr); > ierr = BVSetMatrix(bv,B,PETSC_FALSE);CHKERRQ(ierr); // replace solver's inner product > ierr = EPSSolve(eps);CHKERRQ(ierr); > > I have tried this with test1.c and it works. The computed eigenvectors should be B-orthogonal in this case. > > Jose > > >> >> Thanks, >> >> Matt >> >> Jose >> >> >>> El 14 feb 2021, a las 21:41, Barry Smith escribi?: >>> >>> >>> Florian, >>> >>> I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). >>> >>> All I was saying is theoretically there could/should be such support in SLEPc. >>> >>> Barry >>> >>> >>>> On Feb 13, 2021, at 6:43 PM, Florian Bruckner wrote: >>>> >>>> Dear Barry, >>>> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator. >>>> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator. >>>> >>>> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly? >>>> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator. >>>> >>>> best wishes >>>> Florian >>>> >>>> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith wrote: >>>> >>>> >>>>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet wrote: >>>>> >>>>> >>>>> >>>>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner wrote: >>>>>> >>>>>> Dear Jose, Dear Barry, >>>>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach? >>>>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right? >>>>>> >>>>>> Regarding Barry's advice this is what I first tried: >>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>>>>> st = es.getST() >>>>>> ksp = st.getKSP() >>>>>> ksp.setOperators(self.A0, self.P0) >>>>>> >>>>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right? >>>>> >>>>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1. >>>> >>>> Just trying to provide more clarity with the terms. >>>> >>>> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1 >>>> >>>> Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations". >>>> >>>> Barry >>>> >>>> >>>>> >>>>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement? >>>>> >>>>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though. >>>>> >>>>> Thanks, >>>>> Pierre >>>>> >>>>>> best wishes >>>>>> Florian >>>>>> >>>>>> >>>>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith wrote: >>>>>> >>>>>> >>>>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner wrote: >>>>>>> >>>>>>> Dear Jose, Dear Matt, >>>>>>> >>>>>>> I needed some time to think about your answers. >>>>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. >>>>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). >>>>>>> Unfortunately this interface is not available, right? >>>>>> >>>>>> If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . >>>>>> >>>>>> Barry >>>>>> >>>>>>> >>>>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? >>>>>>> >>>>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. >>>>>>> >>>>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. >>>>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? >>>>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? >>>>>>> >>>>>>> thanks again, >>>>>>> and best wishes >>>>>>> Florian >>>>>>> >>>>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman wrote: >>>>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is >>>>>>> >>>>>>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is >>>>>>> >>>>>>> A0^{-1}*B0*v=theta*v >>>>>>> >>>>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. >>>>>>> >>>>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. >>>>>>> >>>>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. >>>>>>> >>>>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. >>>>>>> >>>>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. >>>>>>> >>>>>>> If you are using the git repo, I could add the relevant code. >>>>>>> >>>>>>> Jose >>>>>>> >>>>>>> >>>>>>> >>>>>>>> El 8 feb 2021, a las 14:22, Matthew Knepley escribi?: >>>>>>>> >>>>>>>> On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner wrote: >>>>>>>> Dear PETSc / SLEPc Users, >>>>>>>> >>>>>>>> my question is very similar to the one posted here: >>>>>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html >>>>>>>> >>>>>>>> The eigensystem I would like to solve looks like: >>>>>>>> B0 v = 1/omega A0 v >>>>>>>> B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). >>>>>>>> >>>>>>>> I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: >>>>>>>> >>>>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) >>>>>>>> st = es.getST() >>>>>>>> ksp = st.getKSP() >>>>>>>> ksp.setOperators(self.A0, self.P0) >>>>>>>> >>>>>>>> Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). >>>>>>>> By the way, should P0 be an approximation of A0 or does it have to include B0? >>>>>>>> >>>>>>>> Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? >>>>>>>> >>>>>>>> Jose can correct me if I say something wrong. >>>>>>>> >>>>>>>> When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and >>>>>>>> then handed that to EPS. You can see me do it here: >>>>>>>> >>>>>>>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 >>>>>>>> >>>>>>>> I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> thanks for any advice >>>>>>>> best wishes >>>>>>>> Florian >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ From swarnava89 at gmail.com Tue Feb 16 18:31:13 2021 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Tue, 16 Feb 2021 19:31:13 -0500 Subject: [petsc-users] makefile for building application with petsc In-Reply-To: <4ff6c811-2823-79d8-1cc-cfe857e8a24c@mcs.anl.gov> References: <6b1bfd4-7d3-ed49-eab1-045e790bfb6@mcs.anl.gov> <4ff6c811-2823-79d8-1cc-cfe857e8a24c@mcs.anl.gov> Message-ID: Thank you for your responses. The CXXPPFLAGS as mentioned by Satish makes it work. Sincerely, Swarnava On Tue, Feb 16, 2021 at 1:07 PM Satish Balay via petsc-users < petsc-users at mcs.anl.gov> wrote: > BTW: Here is a simple makefile for multiple sources. > > >>>> > balay at sb /home/balay/tmp/prj/src > $ ls > main.cxx makefile sub.cxx > balay at sb /home/balay/tmp/prj/src > $ ls ../inc/ > mainc.h > balay at sb /home/balay/tmp/prj/src > $ cat makefile > all: main > > CXXPPFLAGS = -I../inc > LDLIBS = -lmv > > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > include ${PETSC_DIR}/lib/petsc/conf/test > > main: sub.o > <<<<< > > We don't have a simple makefile for the use case where src, obj, binaries > are in different locations. > > gmakefile.test has some code for it [this requires replacing > 'lib/petsc/conf/test' with custom compile targets - as in gmakefile.test]. > > [and what you have appears to work for this usecase.] > > Satish > > On Tue, 16 Feb 2021, Satish Balay via petsc-users wrote: > > > for CXX - its CXXPPFLAGS > > > > > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > > > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core > -lpthread > > > > The link options shouldn't go into preprocessor flags. > > > > Also duplicate blas with petsc might cause grief. Best to build petsc > with mkl. > > > > > > Satish > > > > On Tue, 16 Feb 2021, Matthew Knepley wrote: > > > > > On Tue, Feb 16, 2021 at 9:39 AM Jacob Faibussowitsch < > jacob.fai at gmail.com> > > > wrote: > > > > > > > Swarnava, > > > > > > > > Perhaps try CXXFLAGS instead of CPPFLAGS. Alternatively, you may > > > > explicitly declare a %.o: %.cc target and force it to include your > CPPFLAGS. > > > > > > > > > > No, do not do either of these things. We just need to figure out why > it is > > > not working for you. I will make a small example as soon as possible. > > > > > > Thanks, > > > > > > Matt > > > > > > > > > > Best regards, > > > > > > > > Jacob Faibussowitsch > > > > (Jacob Fai - booss - oh - vitch) > > > > Cell: (312) 694-3391 > > > > > > > > On Feb 16, 2021, at 08:42, Matthew Knepley > wrote: > > > > > > > > On Mon, Feb 15, 2021 at 10:50 PM Barry Smith > wrote: > > > > > > > >> Swarnava, > > > >> > > > >> sddft.h is not a PETSc include file, nor is it used by PETSc so I > think > > > >> the issue is not directly to PETSc it is related to where sddft is > on the > > > >> machine and how it is found by your makefile. > > > >> > > > > > > > > Barry, > > > > > > > > His problem is that he is trying to put extra include flags on the > compile > > > > line, but it is not working. I am wondering if his make is > malfunctioning. > > > > > > > > Thanks, > > > > > > > > Matt > > > > > > > > > > > >> Barry > > > >> > > > >> > > > >> > > > >> > On Feb 15, 2021, at 7:47 PM, Swarnava Ghosh > > > > >> wrote: > > > >> > > > > >> > Dear Petsc developers and users, > > > >> > > > > >> > I am having some issue with building my code with the following > > > >> makefile. I was earlier able to build this with the same makefile > on a > > > >> different machine. Would you please help me out on this issue? > > > >> > > > > >> > Contents of makefile: > > > >> > ============================================== > > > >> > all:sparc > > > >> > > > > >> > CPPFLAGS = -I ./inc -I ${MKLROOT}/include -L ${MKLROOT}/lib/ > > > >> -llapack-addons -lmkl_intel_lp64 -lmkl_sequential -lmkl_core > -lpthread > > > >> > > > > >> > SOURCECPP = ./src/main.cc ./src/initObjs.cc ./src/readfiles.cc > ./src/ > > > >> energy.cc ./src/ExchangeCorrelation.cc ./src/occupation.cc ./src/ > > > >> poisson.cc ./src/chebyshev.cc ./src/scf.cc ./src/mixing.cc ./src/ > > > >> forces.cc ./src/relaxatoms.cc ./src/multipole.cc > ./src/electrostatics.cc > > > >> ./src/tools.cc > > > >> > > > > >> > SOURCEH = ./inc/sddft.h ./inc/isddft.h > > > >> > > > > >> > OBJSC = ./src/main.o ./src/initObjs.o ./src/readfiles.o > ./src/energy.o > > > >> ./src/ExchangeCorrelation.o ./src/occupation.o ./src/poisson.o > > > >> ./src/chebyshev.o ./src/scf.o ./src/mixing.o ./src/forces.o > > > >> ./src/relaxatoms.o ./src/multipole.o ./src/electrostatics.o > ./src/tools.o > > > >> > > > > >> > LIBBASE = ./lib/sparc > > > >> > > > > >> > CLEANFILES = ./lib/sparc > > > >> > > > > >> > include ${PETSC_DIR}/lib/petsc/conf/variables > > > >> > include ${PETSC_DIR}/lib/petsc/conf/rules > > > >> > > > > >> > sparc: ${OBJSC} chkopts > > > >> > ${CLINKER} -Wall -o ${LIBBASE} ${OBJSC} ${PETSC_LIB} > > > >> > ${RM} $(SOURCECPP:%.cc=%.o) > > > >> > > > > >> > =========================================== > > > >> > Error: > > > >> > /home/swarnava/petsc/linux-gnu-intel/bin/mpicxx -o src/main.o -c > -g > > > >> -I/home/swarnava/petsc/include > > > >> -I/home/swarnava/petsc/linux-gnu-intel/include `pwd`/src/main.cc > > > >> > /home/swarnava/Research/Codes/SPARC/src/main.cc(24): catastrophic > > > >> error: cannot open source file "sddft.h" > > > >> > #include "sddft.h" > > > >> > ^ > > > >> > ==================================================== > > > >> > > > > >> > It's not able to see the header file though I have -I ./inc in > > > >> CPPFLAGS. The directory containing makefile has the directory "inc" > with > > > >> the headers and "src" with the .cc files. > > > >> > > > > >> > Thank you, > > > >> > Swarnava > > > >> > > > > >> > > > > >> > > > >> > > > > > > > > -- > > > > What most experimenters take for granted before they begin their > > > > experiments is infinitely more interesting than any results to which > their > > > > experiments lead. > > > > -- Norbert Wiener > > > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.richter at ntnu.no Wed Feb 17 04:11:58 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Wed, 17 Feb 2021 11:11:58 +0100 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results Message-ID: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> Hei, when compiling the attached files using the following compilation line //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK -DUSE_CUDA -I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include -I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include -I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem /opt/fftw3/include -isystem /opt/boost/include -march=native -fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2 -funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native -fopenmp -std=gnu++17 -c -o / and linking them with? //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd -DMKL_LP64 -m64 -o bin/armadillo_with_PETSc? -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran /opt/intel/mkl/lib/intel64/libmkl_rt.so /opt/boost/lib/libboost_filesystem.so.1.72.0 /opt/boost/lib/libboost_mpi.so.1.72.0 /opt/boost/lib/libboost_program_options.so.1.72.0 /opt/boost/lib/libboost_serialization.so.1.72.0 /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so /opt/petsc_release/lib/libpetsc.so /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/ my output is? /Arma and PETSc/MatScale are equal:????????????????????????????????????? ??? ??? 0// //Arma-time for a matrix size of [1024, 8192]:??????????????????????????? ??? ????? 24955// //PETSc-time, pointer for a matrix size of [1024, 8192]:????????????????? ??? 28283// //PETSc-time, MatScale for a matrix size of [1024, 8192]:????????????????? 23138/ but when removing the explicit call to openmp (i.e. removing /-fopenmp/ and //usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/) my result is /Arma and PETSc/MatScale are equal:????????????????????????????????????? ?????? 1// //Arma-time for a matrix size of [1024, 8192]:??????????????????????????? ??? ???? 24878// //PETSc-time, pointer for a matrix size of [1024, 8192]:???????????????????? 18942// //PETSc-time, MatScale for a matrix size of [1024, 8192]:???????????????? 23350/ even though both times the executable is linked to /??????? libmkl_intel_lp64.so => /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00007f9eebd70000)// //??????? libmkl_core.so => /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_core.so (0x00007f9ee77aa000)// //??????? libmkl_intel_thread.so => /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so (0x00007f9ee42c3000)// //??????? libiomp5.so => /opt/intel/compilers_and_libraries_2020.2.254/linux/compiler/lib/intel64_lin/libiomp5.so (0x00007f9ee3ebd000)// //??????? libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f9ea98bd000)/ via the petsc-library. Why does the execution time vary by so much, and why does my result change when calling MatScale (i.e. returning wrong results) when explicitly linking to OpenMP? Thanks! Regards, Roland -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: main_short.cpp Type: text/x-c++src Size: 287 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_scaling.cpp Type: text/x-c++src Size: 4144 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: helper_functions.cpp Type: text/x-c++src Size: 2319 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_scaling.hpp Type: text/x-c++hdr Size: 576 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: helper_functions.hpp Type: text/x-c++hdr Size: 646 bytes Desc: not available URL: From jed at jedbrown.org Wed Feb 17 10:10:37 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 17 Feb 2021 09:10:37 -0700 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> Message-ID: <874kiad69u.fsf@jedbrown.org> You're using an MKL linked to Intel's OpenMP. I could imagine there being symbol conflicts causing MKL to compute wrong results if libgomp symbols are picked up. Note that -fopenmp-simd does not require linking -- it just gives the compiler hints about how to vectorize. So you can probably keep using it and just stop passing libgomp.so. Alternatively, you can link MKL to work with libgomp (see the MKL link advisor). Roland Richter writes: > Hei, > > when compiling the attached files using the following compilation line > > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB > -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK > -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK > -DUSE_CUDA > -I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include > -I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include > -I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem > /opt/fftw3/include -isystem /opt/boost/include -march=native > -fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2 > -funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native > -fopenmp -std=gnu++17 -c -o / > > and linking them with? > > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd > -DMKL_LP64 -m64 -o bin/armadillo_with_PETSc? > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran > /opt/intel/mkl/lib/intel64/libmkl_rt.so > /opt/boost/lib/libboost_filesystem.so.1.72.0 > /opt/boost/lib/libboost_mpi.so.1.72.0 > /opt/boost/lib/libboost_program_options.so.1.72.0 > /opt/boost/lib/libboost_serialization.so.1.72.0 > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so > /opt/petsc_release/lib/libpetsc.so > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/ > > my output is? > > /Arma and PETSc/MatScale are equal:????????????????????????????????????? > ??? ??? 0// > //Arma-time for a matrix size of [1024, > 8192]:??????????????????????????? ??? ????? 24955// > //PETSc-time, pointer for a matrix size of [1024, > 8192]:????????????????? ??? 28283// > //PETSc-time, MatScale for a matrix size of [1024, > 8192]:????????????????? 23138/ > > but when removing the explicit call to openmp (i.e. removing /-fopenmp/ > and //usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/) my result is > > /Arma and PETSc/MatScale are equal:????????????????????????????????????? > ?????? 1// > //Arma-time for a matrix size of [1024, > 8192]:??????????????????????????? ??? ???? 24878// > //PETSc-time, pointer for a matrix size of [1024, > 8192]:???????????????????? 18942// > //PETSc-time, MatScale for a matrix size of [1024, > 8192]:???????????????? 23350/ > > even though both times the executable is linked to > > /??????? libmkl_intel_lp64.so => > /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so > (0x00007f9eebd70000)// > //??????? libmkl_core.so => > /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_core.so > (0x00007f9ee77aa000)// > //??????? libmkl_intel_thread.so => > /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so > (0x00007f9ee42c3000)// > //??????? libiomp5.so => > /opt/intel/compilers_and_libraries_2020.2.254/linux/compiler/lib/intel64_lin/libiomp5.so > (0x00007f9ee3ebd000)// > //??????? libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f9ea98bd000)/ > > via the petsc-library. Why does the execution time vary by so much, and > why does my result change when calling MatScale (i.e. returning wrong > results) when explicitly linking to OpenMP? > > Thanks! > > Regards, > > Roland > > #include > > int main(int argc, char **args) { > PetscMPIInt rank, size; > PetscInitialize(&argc, &args, (char*) 0, NULL); > > MPI_Comm_size(PETSC_COMM_WORLD, &size); > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > test_scaling (1024, 8192, false); > PetscFinalize(); > return 0; > } > #include > > void test_scaling_arma(const arma::cx_mat & in_mat, > arma::cx_mat &out_mat, > const arma::cx_double &scaling_factor) { > out_mat = in_mat; > out_mat *= scaling_factor; > } > > void test_scaling_petsc(const Mat &in_mat, > Mat &out_mat, > const PetscScalar &scaling_factor) { > MatZeroEntries(out_mat); > MatAXPY(out_mat, scaling_factor, in_mat, SAME_NONZERO_PATTERN); > } > > void test_scaling_petsc_pointer(const Mat &in_mat, > Mat &out_mat, > const PetscScalar &scaling_factor) { > const PetscScalar *in_mat_ptr; > PetscScalar *out_mat_ptr; > MatDenseGetArrayRead (in_mat, &in_mat_ptr); > MatDenseGetArrayWrite (out_mat, &out_mat_ptr); > PetscInt r_0, r_1; > MatGetLocalSize (out_mat, &r_0, &r_1); > #pragma omp parallel for > for(int i = 0; i < r_0 * r_1; ++i) > *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor); > > MatDenseRestoreArrayRead (in_mat, &in_mat_ptr); > MatDenseRestoreArrayWrite(out_mat, &out_mat_ptr); > } > > void test_scaling(const size_t matrix_size_rows, const size_t matrix_size_cols, const bool print_matrices) { > PetscMPIInt rank, size; > > MPI_Comm_size(PETSC_COMM_WORLD, &size); > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > arma::cx_mat in_mat = arma::zeros(matrix_size_rows, matrix_size_cols), > out_mat = arma::zeros(matrix_size_rows, matrix_size_cols); > arma::cx_rowvec matrix_vec = arma::conv_to::from(arma::linspace(0, matrix_size_cols - 1, matrix_size_cols)); > in_mat.each_row([&](arma::cx_rowvec &a){ > a = matrix_vec; > }); > > Mat petsc_in_mat, petsc_out_mat; > arma::cx_mat petsc_out_comparison_mat = arma::zeros(matrix_size_rows, matrix_size_cols); > MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, matrix_size_rows, matrix_size_cols, NULL, &petsc_in_mat); > MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, matrix_size_rows, matrix_size_cols, NULL, &petsc_out_mat); > > create_data_in_PETSc_from_scratch (in_mat, petsc_in_mat); > MatZeroEntries(petsc_out_mat); > > const std::complex scaling_factor{1. / matrix_size_cols, 0.}; > > test_scaling_arma (in_mat, out_mat, scaling_factor); > test_scaling_petsc (petsc_in_mat, petsc_out_mat, scaling_factor); > > //Benchmark > auto t1 = std::chrono::high_resolution_clock::now(); > for(size_t i = 0; i < bench_rounds; ++i) { > test_scaling_arma (in_mat, out_mat, scaling_factor); > } > auto t2 = std::chrono::high_resolution_clock::now(); > > auto t3 = std::chrono::high_resolution_clock::now(); > for(size_t i = 0; i < bench_rounds; ++i) { > test_scaling_petsc_pointer (petsc_in_mat, petsc_out_mat, scaling_factor); > } > auto t4 = std::chrono::high_resolution_clock::now(); > > auto t5 = std::chrono::high_resolution_clock::now(); > for(size_t i = 0; i < bench_rounds; ++i) { > test_scaling_petsc (petsc_in_mat, petsc_out_mat, scaling_factor); > } > auto t6 = std::chrono::high_resolution_clock::now(); > > > retrieve_data_from_PETSc (petsc_out_mat, petsc_out_comparison_mat, matrix_size_cols, matrix_size_rows); > > if(print_matrices && rank == 0) { > std::cout << "In-matrix, ARMA:\n" << in_mat > << "\n\nOut-matrix, ARMA:\n" << out_mat > << "\n\nComparison-out-matrix, ARMA:\n" << petsc_out_comparison_mat > << "\n\nDifference: \n" << arma::abs(petsc_out_comparison_mat - out_mat) > <<'\n'; > } > if(rank == 0) { > std::cout << "Arma and PETSc/MatScale are equal:\t\t\t\t\t" << arma::approx_equal(out_mat, petsc_out_comparison_mat, "reldiff", 1e-8) << '\n'; > std::cout << "Arma-time for a matrix size of [" > << matrix_size_rows << ", " > << matrix_size_cols << "]:\t\t\t\t" > << std::chrono::duration_cast( t2 - t1 ).count() << '\n'; > std::cout << "PETSc-time, pointer for a matrix size of [" > << matrix_size_rows << ", " > << matrix_size_cols << "]:\t\t\t" > << std::chrono::duration_cast( t4 - t3 ).count() << '\n'; > std::cout << "PETSc-time, MatScale for a matrix size of [" > << matrix_size_rows << ", " > << matrix_size_cols << "]:\t\t\t" > << std::chrono::duration_cast( t6 - t5 ).count() << '\n'; > } > MatDestroy (&petsc_in_mat); > MatDestroy (&petsc_out_mat); > } > #include > > void retrieve_data_from_PETSc(const Mat petsc_mat, arma::cx_mat &out_data, > const arma::uword Ntime, const arma::uword Nradius) { > PetscMPIInt size; > MPI_Comm_size(PETSC_COMM_WORLD, &size); > if(out_data.n_rows != Ntime && out_data.n_cols != Nradius) { > out_data = arma::zeros(Ntime, Nradius); > } > Mat local_mat; > arma::Col vector_indices_radius = arma::linspace>(0, Nradius - 1, Nradius); > arma::Col vector_indices_time = arma::linspace>(0, Ntime - 1, Ntime); > //MatCreateRedundantMatrix(petsc_mat, Ntime * Nradius, MPI_COMM_NULL, MAT_INITIAL_MATRIX, &local_mat); > MatCreateRedundantMatrix(petsc_mat, size, MPI_COMM_NULL, MAT_INITIAL_MATRIX, &local_mat); > MatAssemblyBegin(local_mat, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(local_mat, MAT_FINAL_ASSEMBLY); > MatGetValues(local_mat, Nradius, vector_indices_radius.memptr(), Ntime, vector_indices_time.memptr(), out_data.memptr()); > MatDestroy(&local_mat); > out_data = out_data.st(); > } > > void store_data_in_PETSc(const arma::cx_mat &in_data, Mat &petsc_mat) { > const arma::uword Ntime = in_data.n_cols; > const arma::uword Nradius = in_data.n_rows; > arma::Col vector_indices_radius = arma::linspace>(0, Nradius - 1, Nradius); > arma::Col vector_indices_time = arma::linspace>(0, Ntime - 1, Ntime); > MatZeroEntries(petsc_mat); > MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY); > arma::cx_mat local_mat = in_data.st(); > MatSetValues(petsc_mat, Nradius, vector_indices_radius.memptr(), Ntime, vector_indices_time.memptr(), local_mat.memptr(), INSERT_VALUES); > MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY); > } > > void create_data_in_PETSc_from_scratch(const arma::cx_mat &in_data, Mat &petsc_mat) { > const arma::uword Ntime = in_data.n_cols; > const arma::uword Nradius = in_data.n_rows; > MatZeroEntries(petsc_mat); > MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY); > for(int i = 0; i < (int)Ntime; ++i){ > for(int j = 0; j < (int)Nradius; ++j) { > MatSetValue(petsc_mat, j, i, i, INSERT_VALUES); > } > } > MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY); > } > #ifndef TEST_SCALING_HPP > #define TEST_SCALING_HPP > > #include > > void test_scaling_arma(const arma::cx_mat & in_mat, > arma::cx_mat &out_mat, > const arma::cx_double &scaling_factor); > > void test_scaling_petsc(const Mat &in_mat, > Mat &out_mat, > const PetscScalar &scaling_factor); > > void test_scaling_petsc_pointer(const Mat &in_mat, > Mat &out_mat, > const PetscScalar &scaling_factor); > > void test_scaling(const size_t matrix_size_rows, const size_t matrix_size_cols, const bool print_matrices); > > #endif // TEST_SCALING_HPP > #ifndef HELPER_FUNCTIONS_HPP > #define HELPER_FUNCTIONS_HPP > > #include > > #include > #include > #include > #include > #include > #include > #include > #include > > #include > > constexpr int bench_rounds = 1000; > > void retrieve_data_from_PETSc(const Mat petsc_mat, arma::cx_mat &out_data, > const arma::uword Ntime, const arma::uword Nradius); > > void store_data_in_PETSc(const arma::cx_mat &in_data, Mat &petsc_mat); > > void create_data_in_PETSc_from_scratch(const arma::cx_mat &in_data, Mat &petsc_mat); > > #endif // HELPER_FUNCTIONS_HPP From roland.richter at ntnu.no Wed Feb 17 10:23:48 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Wed, 17 Feb 2021 17:23:48 +0100 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: <874kiad69u.fsf@jedbrown.org> References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> Message-ID: <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> Hei, I replaced the linking line with //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd -DMKL_LP64 -m64 CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o bin/armadillo_with_PETSc? -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran? -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl /opt/boost/lib/libboost_filesystem.so.1.72.0 /opt/boost/lib/libboost_mpi.so.1.72.0 /opt/boost/lib/libboost_program_options.so.1.72.0 /opt/boost/lib/libboost_serialization.so.1.72.0 /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so /opt/petsc_release/lib/libpetsc.so /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so / and now the results are correct. Nevertheless, when comparing the loop in line 26-28 in file test_scaling.cpp /#pragma omp parallel for// //??? for(int i = 0; i < r_0 * r_1; ++i)// //??? ??? *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ the version without /#pragma omp parallel/ for is significantly faster (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there still such a big difference? Thanks! Am 17.02.21 um 17:10 schrieb Jed Brown: > You're using an MKL linked to Intel's OpenMP. I could imagine there being symbol conflicts causing MKL to compute wrong results if libgomp symbols are picked up. > > Note that -fopenmp-simd does not require linking -- it just gives the compiler hints about how to vectorize. So you can probably keep using it and just stop passing libgomp.so. Alternatively, you can link MKL to work with libgomp (see the MKL link advisor). > > Roland Richter writes: > >> Hei, >> >> when compiling the attached files using the following compilation line >> >> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB >> -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK >> -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK >> -DUSE_CUDA >> -I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include >> -I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include >> -I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem >> /opt/fftw3/include -isystem /opt/boost/include -march=native >> -fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2 >> -funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native >> -fopenmp -std=gnu++17 -c -o / >> >> and linking them with? >> >> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd >> -DMKL_LP64 -m64 -o bin/armadillo_with_PETSc? >> -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib >> /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran >> /opt/intel/mkl/lib/intel64/libmkl_rt.so >> /opt/boost/lib/libboost_filesystem.so.1.72.0 >> /opt/boost/lib/libboost_mpi.so.1.72.0 >> /opt/boost/lib/libboost_program_options.so.1.72.0 >> /opt/boost/lib/libboost_serialization.so.1.72.0 >> /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so >> /opt/petsc_release/lib/libpetsc.so >> /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/ >> >> my output is? >> >> /Arma and PETSc/MatScale are equal:????????????????????????????????????? >> ??? ??? 0// >> //Arma-time for a matrix size of [1024, >> 8192]:??????????????????????????? ??? ????? 24955// >> //PETSc-time, pointer for a matrix size of [1024, >> 8192]:????????????????? ??? 28283// >> //PETSc-time, MatScale for a matrix size of [1024, >> 8192]:????????????????? 23138/ >> >> but when removing the explicit call to openmp (i.e. removing /-fopenmp/ >> and //usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so/) my result is >> >> /Arma and PETSc/MatScale are equal:????????????????????????????????????? >> ?????? 1// >> //Arma-time for a matrix size of [1024, >> 8192]:??????????????????????????? ??? ???? 24878// >> //PETSc-time, pointer for a matrix size of [1024, >> 8192]:???????????????????? 18942// >> //PETSc-time, MatScale for a matrix size of [1024, >> 8192]:???????????????? 23350/ >> >> even though both times the executable is linked to >> >> /??????? libmkl_intel_lp64.so => >> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so >> (0x00007f9eebd70000)// >> //??????? libmkl_core.so => >> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_core.so >> (0x00007f9ee77aa000)// >> //??????? libmkl_intel_thread.so => >> /opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so >> (0x00007f9ee42c3000)// >> //??????? libiomp5.so => >> /opt/intel/compilers_and_libraries_2020.2.254/linux/compiler/lib/intel64_lin/libiomp5.so >> (0x00007f9ee3ebd000)// >> //??????? libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f9ea98bd000)/ >> >> via the petsc-library. Why does the execution time vary by so much, and >> why does my result change when calling MatScale (i.e. returning wrong >> results) when explicitly linking to OpenMP? >> >> Thanks! >> >> Regards, >> >> Roland >> >> #include >> >> int main(int argc, char **args) { >> PetscMPIInt rank, size; >> PetscInitialize(&argc, &args, (char*) 0, NULL); >> >> MPI_Comm_size(PETSC_COMM_WORLD, &size); >> MPI_Comm_rank(PETSC_COMM_WORLD, &rank); >> >> test_scaling (1024, 8192, false); >> PetscFinalize(); >> return 0; >> } >> #include >> >> void test_scaling_arma(const arma::cx_mat & in_mat, >> arma::cx_mat &out_mat, >> const arma::cx_double &scaling_factor) { >> out_mat = in_mat; >> out_mat *= scaling_factor; >> } >> >> void test_scaling_petsc(const Mat &in_mat, >> Mat &out_mat, >> const PetscScalar &scaling_factor) { >> MatZeroEntries(out_mat); >> MatAXPY(out_mat, scaling_factor, in_mat, SAME_NONZERO_PATTERN); >> } >> >> void test_scaling_petsc_pointer(const Mat &in_mat, >> Mat &out_mat, >> const PetscScalar &scaling_factor) { >> const PetscScalar *in_mat_ptr; >> PetscScalar *out_mat_ptr; >> MatDenseGetArrayRead (in_mat, &in_mat_ptr); >> MatDenseGetArrayWrite (out_mat, &out_mat_ptr); >> PetscInt r_0, r_1; >> MatGetLocalSize (out_mat, &r_0, &r_1); >> #pragma omp parallel for >> for(int i = 0; i < r_0 * r_1; ++i) >> *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor); >> >> MatDenseRestoreArrayRead (in_mat, &in_mat_ptr); >> MatDenseRestoreArrayWrite(out_mat, &out_mat_ptr); >> } >> >> void test_scaling(const size_t matrix_size_rows, const size_t matrix_size_cols, const bool print_matrices) { >> PetscMPIInt rank, size; >> >> MPI_Comm_size(PETSC_COMM_WORLD, &size); >> MPI_Comm_rank(PETSC_COMM_WORLD, &rank); >> >> arma::cx_mat in_mat = arma::zeros(matrix_size_rows, matrix_size_cols), >> out_mat = arma::zeros(matrix_size_rows, matrix_size_cols); >> arma::cx_rowvec matrix_vec = arma::conv_to::from(arma::linspace(0, matrix_size_cols - 1, matrix_size_cols)); >> in_mat.each_row([&](arma::cx_rowvec &a){ >> a = matrix_vec; >> }); >> >> Mat petsc_in_mat, petsc_out_mat; >> arma::cx_mat petsc_out_comparison_mat = arma::zeros(matrix_size_rows, matrix_size_cols); >> MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, matrix_size_rows, matrix_size_cols, NULL, &petsc_in_mat); >> MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, matrix_size_rows, matrix_size_cols, NULL, &petsc_out_mat); >> >> create_data_in_PETSc_from_scratch (in_mat, petsc_in_mat); >> MatZeroEntries(petsc_out_mat); >> >> const std::complex scaling_factor{1. / matrix_size_cols, 0.}; >> >> test_scaling_arma (in_mat, out_mat, scaling_factor); >> test_scaling_petsc (petsc_in_mat, petsc_out_mat, scaling_factor); >> >> //Benchmark >> auto t1 = std::chrono::high_resolution_clock::now(); >> for(size_t i = 0; i < bench_rounds; ++i) { >> test_scaling_arma (in_mat, out_mat, scaling_factor); >> } >> auto t2 = std::chrono::high_resolution_clock::now(); >> >> auto t3 = std::chrono::high_resolution_clock::now(); >> for(size_t i = 0; i < bench_rounds; ++i) { >> test_scaling_petsc_pointer (petsc_in_mat, petsc_out_mat, scaling_factor); >> } >> auto t4 = std::chrono::high_resolution_clock::now(); >> >> auto t5 = std::chrono::high_resolution_clock::now(); >> for(size_t i = 0; i < bench_rounds; ++i) { >> test_scaling_petsc (petsc_in_mat, petsc_out_mat, scaling_factor); >> } >> auto t6 = std::chrono::high_resolution_clock::now(); >> >> >> retrieve_data_from_PETSc (petsc_out_mat, petsc_out_comparison_mat, matrix_size_cols, matrix_size_rows); >> >> if(print_matrices && rank == 0) { >> std::cout << "In-matrix, ARMA:\n" << in_mat >> << "\n\nOut-matrix, ARMA:\n" << out_mat >> << "\n\nComparison-out-matrix, ARMA:\n" << petsc_out_comparison_mat >> << "\n\nDifference: \n" << arma::abs(petsc_out_comparison_mat - out_mat) >> <<'\n'; >> } >> if(rank == 0) { >> std::cout << "Arma and PETSc/MatScale are equal:\t\t\t\t\t" << arma::approx_equal(out_mat, petsc_out_comparison_mat, "reldiff", 1e-8) << '\n'; >> std::cout << "Arma-time for a matrix size of [" >> << matrix_size_rows << ", " >> << matrix_size_cols << "]:\t\t\t\t" >> << std::chrono::duration_cast( t2 - t1 ).count() << '\n'; >> std::cout << "PETSc-time, pointer for a matrix size of [" >> << matrix_size_rows << ", " >> << matrix_size_cols << "]:\t\t\t" >> << std::chrono::duration_cast( t4 - t3 ).count() << '\n'; >> std::cout << "PETSc-time, MatScale for a matrix size of [" >> << matrix_size_rows << ", " >> << matrix_size_cols << "]:\t\t\t" >> << std::chrono::duration_cast( t6 - t5 ).count() << '\n'; >> } >> MatDestroy (&petsc_in_mat); >> MatDestroy (&petsc_out_mat); >> } >> #include >> >> void retrieve_data_from_PETSc(const Mat petsc_mat, arma::cx_mat &out_data, >> const arma::uword Ntime, const arma::uword Nradius) { >> PetscMPIInt size; >> MPI_Comm_size(PETSC_COMM_WORLD, &size); >> if(out_data.n_rows != Ntime && out_data.n_cols != Nradius) { >> out_data = arma::zeros(Ntime, Nradius); >> } >> Mat local_mat; >> arma::Col vector_indices_radius = arma::linspace>(0, Nradius - 1, Nradius); >> arma::Col vector_indices_time = arma::linspace>(0, Ntime - 1, Ntime); >> //MatCreateRedundantMatrix(petsc_mat, Ntime * Nradius, MPI_COMM_NULL, MAT_INITIAL_MATRIX, &local_mat); >> MatCreateRedundantMatrix(petsc_mat, size, MPI_COMM_NULL, MAT_INITIAL_MATRIX, &local_mat); >> MatAssemblyBegin(local_mat, MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(local_mat, MAT_FINAL_ASSEMBLY); >> MatGetValues(local_mat, Nradius, vector_indices_radius.memptr(), Ntime, vector_indices_time.memptr(), out_data.memptr()); >> MatDestroy(&local_mat); >> out_data = out_data.st(); >> } >> >> void store_data_in_PETSc(const arma::cx_mat &in_data, Mat &petsc_mat) { >> const arma::uword Ntime = in_data.n_cols; >> const arma::uword Nradius = in_data.n_rows; >> arma::Col vector_indices_radius = arma::linspace>(0, Nradius - 1, Nradius); >> arma::Col vector_indices_time = arma::linspace>(0, Ntime - 1, Ntime); >> MatZeroEntries(petsc_mat); >> MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY); >> arma::cx_mat local_mat = in_data.st(); >> MatSetValues(petsc_mat, Nradius, vector_indices_radius.memptr(), Ntime, vector_indices_time.memptr(), local_mat.memptr(), INSERT_VALUES); >> MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY); >> } >> >> void create_data_in_PETSc_from_scratch(const arma::cx_mat &in_data, Mat &petsc_mat) { >> const arma::uword Ntime = in_data.n_cols; >> const arma::uword Nradius = in_data.n_rows; >> MatZeroEntries(petsc_mat); >> MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY); >> for(int i = 0; i < (int)Ntime; ++i){ >> for(int j = 0; j < (int)Nradius; ++j) { >> MatSetValue(petsc_mat, j, i, i, INSERT_VALUES); >> } >> } >> MatAssemblyBegin(petsc_mat, MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(petsc_mat, MAT_FINAL_ASSEMBLY); >> } >> #ifndef TEST_SCALING_HPP >> #define TEST_SCALING_HPP >> >> #include >> >> void test_scaling_arma(const arma::cx_mat & in_mat, >> arma::cx_mat &out_mat, >> const arma::cx_double &scaling_factor); >> >> void test_scaling_petsc(const Mat &in_mat, >> Mat &out_mat, >> const PetscScalar &scaling_factor); >> >> void test_scaling_petsc_pointer(const Mat &in_mat, >> Mat &out_mat, >> const PetscScalar &scaling_factor); >> >> void test_scaling(const size_t matrix_size_rows, const size_t matrix_size_cols, const bool print_matrices); >> >> #endif // TEST_SCALING_HPP >> #ifndef HELPER_FUNCTIONS_HPP >> #define HELPER_FUNCTIONS_HPP >> >> #include >> >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> >> #include >> >> constexpr int bench_rounds = 1000; >> >> void retrieve_data_from_PETSc(const Mat petsc_mat, arma::cx_mat &out_data, >> const arma::uword Ntime, const arma::uword Nradius); >> >> void store_data_in_PETSc(const arma::cx_mat &in_data, Mat &petsc_mat); >> >> void create_data_in_PETSc_from_scratch(const arma::cx_mat &in_data, Mat &petsc_mat); >> >> #endif // HELPER_FUNCTIONS_HPP -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Feb 17 10:49:49 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 17 Feb 2021 09:49:49 -0700 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> Message-ID: <87y2fmbpw2.fsf@jedbrown.org> Roland Richter writes: > Hei, > > I replaced the linking line with > > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd > -DMKL_LP64 -m64 > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o > bin/armadillo_with_PETSc? > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran? > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl > /opt/boost/lib/libboost_filesystem.so.1.72.0 > /opt/boost/lib/libboost_mpi.so.1.72.0 > /opt/boost/lib/libboost_program_options.so.1.72.0 > /opt/boost/lib/libboost_serialization.so.1.72.0 > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so > /opt/petsc_release/lib/libpetsc.so > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so > / > > and now the results are correct. Nevertheless, when comparing the loop > in line 26-28 in file test_scaling.cpp > > /#pragma omp parallel for// > //??? for(int i = 0; i < r_0 * r_1; ++i)// > //??? ??? *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ > > the version without /#pragma omp parallel/ for is significantly faster > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there > still such a big difference? Sounds like you're using a profile to attribute time? Each `omp parallel` region incurs a cost ranging from about a microsecond to 10 or more microseconds depending on architecture, number of threads, and OpenMP implementation. Your loop (for double precision) operates at around 8 entries per clock cycle (depending on architecture) if the operands are in cache so the loop size r_0 * r_1 should be at least 10000 just to pay off the cost of `omp parallel`. From roland.richter at ntnu.no Wed Feb 17 11:11:04 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Wed, 17 Feb 2021 17:11:04 +0000 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: <87y2fmbpw2.fsf@jedbrown.org> References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no>, <87y2fmbpw2.fsf@jedbrown.org> Message-ID: <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> My PetscScalar is complex double (i.e. even higher penalty), but my matrix has a size of 8kk elements, so that should not an issue. Regards, Roland ________________________________ Von: Jed Brown Gesendet: Mittwoch, 17. Februar 2021 17:49:49 An: Roland Richter; PETSc Betreff: Re: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results Roland Richter writes: > Hei, > > I replaced the linking line with > > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -march=native -fopenmp-simd > -DMKL_LP64 -m64 > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o > bin/armadillo_with_PETSc > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl > /opt/boost/lib/libboost_filesystem.so.1.72.0 > /opt/boost/lib/libboost_mpi.so.1.72.0 > /opt/boost/lib/libboost_program_options.so.1.72.0 > /opt/boost/lib/libboost_serialization.so.1.72.0 > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so > /opt/petsc_release/lib/libpetsc.so > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so > / > > and now the results are correct. Nevertheless, when comparing the loop > in line 26-28 in file test_scaling.cpp > > /#pragma omp parallel for// > // for(int i = 0; i < r_0 * r_1; ++i)// > // *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ > > the version without /#pragma omp parallel/ for is significantly faster > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there > still such a big difference? Sounds like you're using a profile to attribute time? Each `omp parallel` region incurs a cost ranging from about a microsecond to 10 or more microseconds depending on architecture, number of threads, and OpenMP implementation. Your loop (for double precision) operates at around 8 entries per clock cycle (depending on architecture) if the operands are in cache so the loop size r_0 * r_1 should be at least 10000 just to pay off the cost of `omp parallel`. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 17 11:51:43 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 17 Feb 2021 12:51:43 -0500 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> <87y2fmbpw2.fsf@jedbrown.org> <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> Message-ID: Jed, is it possible that this is an oversubscription penalty from bad OpenMP settings? Thanks, Matt On Wed, Feb 17, 2021 at 12:11 PM Roland Richter wrote: > My PetscScalar is complex double (i.e. even higher penalty), but my matrix > has a size of 8kk elements, so that should not an issue. > Regards, > Roland > ------------------------------ > *Von:* Jed Brown > *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49 > *An:* Roland Richter; PETSc > *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in > performance drop and wrong results > > Roland Richter writes: > > > Hei, > > > > I replaced the linking line with > > > > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -march=native -fopenmp-simd > > -DMKL_LP64 -m64 > > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o > > bin/armadillo_with_PETSc > > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib > > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran > > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 > > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl > > /opt/boost/lib/libboost_filesystem.so.1.72.0 > > /opt/boost/lib/libboost_mpi.so.1.72.0 > > /opt/boost/lib/libboost_program_options.so.1.72.0 > > /opt/boost/lib/libboost_serialization.so.1.72.0 > > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so > > /opt/petsc_release/lib/libpetsc.so > > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so > > / > > > > and now the results are correct. Nevertheless, when comparing the loop > > in line 26-28 in file test_scaling.cpp > > > > /#pragma omp parallel for// > > // for(int i = 0; i < r_0 * r_1; ++i)// > > // *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ > > > > the version without /#pragma omp parallel/ for is significantly faster > > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there > > still such a big difference? > > Sounds like you're using a profile to attribute time? Each `omp parallel` > region incurs a cost ranging from about a microsecond to 10 or more > microseconds depending on architecture, number of threads, and OpenMP > implementation. Your loop (for double precision) operates at around 8 > entries per clock cycle (depending on architecture) if the operands are in > cache so the loop size r_0 * r_1 should be at least 10000 just to pay off > the cost of `omp parallel`. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Feb 17 11:56:04 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 17 Feb 2021 10:56:04 -0700 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> <87y2fmbpw2.fsf@jedbrown.org> <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> Message-ID: <87v9aqbmtn.fsf@jedbrown.org> It's entirely possible, especially if libgomp is being mixed with libiomp. Roland hasn't show us the compilation line (just linker), because `omp parallel` shouldn't do anything with just -fopenmp-simd and no -fopenmp. Matthew Knepley writes: > Jed, is it possible that this is an oversubscription penalty from bad > OpenMP settings? cuneiform> > > Thanks, > > Matt > > On Wed, Feb 17, 2021 at 12:11 PM Roland Richter > wrote: > >> My PetscScalar is complex double (i.e. even higher penalty), but my matrix >> has a size of 8kk elements, so that should not an issue. >> Regards, >> Roland >> ------------------------------ >> *Von:* Jed Brown >> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49 >> *An:* Roland Richter; PETSc >> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in >> performance drop and wrong results >> >> Roland Richter writes: >> >> > Hei, >> > >> > I replaced the linking line with >> > >> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -march=native -fopenmp-simd >> > -DMKL_LP64 -m64 >> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o >> > bin/armadillo_with_PETSc >> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib >> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran >> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 >> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl >> > /opt/boost/lib/libboost_filesystem.so.1.72.0 >> > /opt/boost/lib/libboost_mpi.so.1.72.0 >> > /opt/boost/lib/libboost_program_options.so.1.72.0 >> > /opt/boost/lib/libboost_serialization.so.1.72.0 >> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so >> > /opt/petsc_release/lib/libpetsc.so >> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so >> > / >> > >> > and now the results are correct. Nevertheless, when comparing the loop >> > in line 26-28 in file test_scaling.cpp >> > >> > /#pragma omp parallel for// >> > // for(int i = 0; i < r_0 * r_1; ++i)// >> > // *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ >> > >> > the version without /#pragma omp parallel/ for is significantly faster >> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there >> > still such a big difference? >> >> Sounds like you're using a profile to attribute time? Each `omp parallel` >> region incurs a cost ranging from about a microsecond to 10 or more >> microseconds depending on architecture, number of threads, and OpenMP >> implementation. Your loop (for double precision) operates at around 8 >> entries per clock cycle (depending on architecture) if the operands are in >> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off >> the cost of `omp parallel`. >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From roland.richter at ntnu.no Wed Feb 17 13:28:50 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Wed, 17 Feb 2021 20:28:50 +0100 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: <87v9aqbmtn.fsf@jedbrown.org> References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> <87y2fmbpw2.fsf@jedbrown.org> <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> <87v9aqbmtn.fsf@jedbrown.org> Message-ID: <0d840f08-c390-9cf7-d833-7a4f3efb37b5@ntnu.no> Hei, the compilation line is (as shown below) //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK -DUSE_CUDA -I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include -I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include -I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem /opt/fftw3/include -isystem /opt/boost/include -march=native -fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2 -funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native -fopenmp -std=gnu++17 -c -o / Regards, Roland // Am 17.02.2021 um 18:56 schrieb Jed Brown: > It's entirely possible, especially if libgomp is being mixed with libiomp. > > Roland hasn't show us the compilation line (just linker), because `omp parallel` shouldn't do anything with just -fopenmp-simd and no -fopenmp. > > Matthew Knepley writes: > >> Jed, is it possible that this is an oversubscription penalty from bad >> OpenMP settings? > cuneiform> >> >> Thanks, >> >> Matt >> >> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter >> wrote: >> >>> My PetscScalar is complex double (i.e. even higher penalty), but my matrix >>> has a size of 8kk elements, so that should not an issue. >>> Regards, >>> Roland >>> ------------------------------ >>> *Von:* Jed Brown >>> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49 >>> *An:* Roland Richter; PETSc >>> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in >>> performance drop and wrong results >>> >>> Roland Richter writes: >>> >>>> Hei, >>>> >>>> I replaced the linking line with >>>> >>>> //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -march=native -fopenmp-simd >>>> -DMKL_LP64 -m64 >>>> CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o >>>> bin/armadillo_with_PETSc >>>> -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib >>>> /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran >>>> -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 >>>> -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl >>>> /opt/boost/lib/libboost_filesystem.so.1.72.0 >>>> /opt/boost/lib/libboost_mpi.so.1.72.0 >>>> /opt/boost/lib/libboost_program_options.so.1.72.0 >>>> /opt/boost/lib/libboost_serialization.so.1.72.0 >>>> /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so >>>> /opt/petsc_release/lib/libpetsc.so >>>> /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so >>>> / >>>> >>>> and now the results are correct. Nevertheless, when comparing the loop >>>> in line 26-28 in file test_scaling.cpp >>>> >>>> /#pragma omp parallel for// >>>> // for(int i = 0; i < r_0 * r_1; ++i)// >>>> // *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ >>>> >>>> the version without /#pragma omp parallel/ for is significantly faster >>>> (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there >>>> still such a big difference? >>> Sounds like you're using a profile to attribute time? Each `omp parallel` >>> region incurs a cost ranging from about a microsecond to 10 or more >>> microseconds depending on architecture, number of threads, and OpenMP >>> implementation. Your loop (for double precision) operates at around 8 >>> entries per clock cycle (depending on architecture) if the operands are in >>> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off >>> the cost of `omp parallel`. >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Feb 17 14:14:43 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 17 Feb 2021 13:14:43 -0700 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: <0d840f08-c390-9cf7-d833-7a4f3efb37b5@ntnu.no> References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> <87y2fmbpw2.fsf@jedbrown.org> <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> <87v9aqbmtn.fsf@jedbrown.org> <0d840f08-c390-9cf7-d833-7a4f3efb37b5@ntnu.no> Message-ID: <87h7mabgek.fsf@jedbrown.org> Roland Richter writes: > Hei, > > the compilation line is (as shown below) > > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -DBOOST_ALL_NO_LIB > -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_MPI_DYN_LINK > -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_SERIALIZATION_DYN_LINK > -DUSE_CUDA > -I/home/roland/Dokumente/C++-Projekte/armadillo_with_PETSc/include > -I/opt/intel/compilers_and_libraries_2020.2.254/linux/mkl/include > -I/opt/armadillo/include -isystem /opt/petsc_release/include -isystem > /opt/fftw3/include -isystem /opt/boost/include -march=native > -fopenmp-simd -DMKL_LP64 -m64 -Wall -Wextra -pedantic -fPIC -flto -O2 > -funroll-loops -funroll-all-loops -fstrict-aliasing -mavx -march=native > -fopenmp -std=gnu++17 -c -o / -fopenmp implies -fopenmp-simd so you don't need both. You have -fopenmp here so it'll use threading, and likely default to libgomp (depending on what compiler is behind the mpicxx wrapper). From zhaog6 at lsec.cc.ac.cn Wed Feb 17 18:47:44 2021 From: zhaog6 at lsec.cc.ac.cn (=?UTF-8?B?6LW15Yia?=) Date: Thu, 18 Feb 2021 08:47:44 +0800 (GMT+08:00) Subject: [petsc-users] An issue about pipelined CG and Gropp's CG Message-ID: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn> Dear PETSc team, I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)? Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes. Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you. Thanks, Gang From bsmith at petsc.dev Wed Feb 17 19:17:17 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 17 Feb 2021 19:17:17 -0600 Subject: [petsc-users] An issue about pipelined CG and Gropp's CG In-Reply-To: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn> References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn> Message-ID: > On Feb 17, 2021, at 6:47 PM, ?? wrote: > > Dear PETSc team, > > I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)? > > Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes. 900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference. Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect. Barry > > Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you. > > > Thanks, > Gang From heepark at sandia.gov Wed Feb 17 19:23:37 2021 From: heepark at sandia.gov (Park, Heeho) Date: Thu, 18 Feb 2021 01:23:37 +0000 Subject: [petsc-users] insufficient virtual memory? Message-ID: Hi PETSc developers, Have you seen this error message? forrtl: severe (41): insufficient virtual memory We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is failing with the error message on our HPC systems. Ironically, it runs on our laptop (super slow.) type: seqbaij rows=46251272, cols=46251272 total: nonzeros=323046210, allocated nonzeros=323046210 total number of mallocs used during MatSetValues calls=0 block size is 1 Does anyone have experience encountering this problem? Thanks, Heeho Daniel Park ! ------------------------------------ ! Sandia National Laboratories Org: 08844, R&D Work: 505-844-1319 ! ------------------------------------ ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 17 19:27:29 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 17 Feb 2021 20:27:29 -0500 Subject: [petsc-users] insufficient virtual memory? In-Reply-To: References: Message-ID: On Wed, Feb 17, 2021 at 8:23 PM Park, Heeho via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi PETSc developers, > > > > Have you seen this error message? > > > > forrtl: severe (41): insufficient virtual memory > I believe this is an OS setting, independent of the code itself, which explains the difference you see between machines. This also suggests asking the local sysadmin for your cluster. Thanks, Matt > > > We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is > failing with the error message on our HPC systems. > > Ironically, it runs on our laptop (super slow.) > > > > type: seqbaij > > rows=46251272, cols=46251272 > > total: nonzeros=323046210, allocated nonzeros=323046210 > > total number of mallocs used during MatSetValues calls=0 > > block size is 1 > > > > Does anyone have experience encountering this problem? > > > > Thanks, > > > > Heeho Daniel Park > > ! ------------------------------------ ! > Sandia National Laboratories > > Org: 08844, R&D > > Work: 505-844-1319 > ! ------------------------------------ ! > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From heepark at sandia.gov Wed Feb 17 19:32:19 2021 From: heepark at sandia.gov (Park, Heeho) Date: Thu, 18 Feb 2021 01:32:19 +0000 Subject: [petsc-users] [EXTERNAL] Re: insufficient virtual memory? In-Reply-To: References: Message-ID: <1BF5C245-9FD9-4E77-8817-7AC61A094755@sandia.gov> That makes sense. That was my suspicion too. I will contact the sys admin. Thanks. - Heeho Daniel Park From: Matthew Knepley Date: Wednesday, February 17, 2021 at 5:28 PM To: "Park, Heeho" Cc: "petsc-users at mcs.anl.gov" Subject: [EXTERNAL] Re: [petsc-users] insufficient virtual memory? On Wed, Feb 17, 2021 at 8:23 PM Park, Heeho via petsc-users > wrote: Hi PETSc developers, Have you seen this error message? forrtl: severe (41): insufficient virtual memory I believe this is an OS setting, independent of the code itself, which explains the difference you see between machines. This also suggests asking the local sysadmin for your cluster. Thanks, Matt We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is failing with the error message on our HPC systems. Ironically, it runs on our laptop (super slow.) type: seqbaij rows=46251272, cols=46251272 total: nonzeros=323046210, allocated nonzeros=323046210 total number of mallocs used during MatSetValues calls=0 block size is 1 Does anyone have experience encountering this problem? Thanks, Heeho Daniel Park ! ------------------------------------ ! Sandia National Laboratories Org: 08844, R&D Work: 505-844-1319 ! ------------------------------------ ! -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhaog6 at lsec.cc.ac.cn Wed Feb 17 20:31:14 2021 From: zhaog6 at lsec.cc.ac.cn (=?UTF-8?B?6LW15Yia?=) Date: Thu, 18 Feb 2021 10:31:14 +0800 (GMT+08:00) Subject: [petsc-users] An issue about pipelined CG and Gropp's CG In-Reply-To: References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn> Message-ID: <68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn> Dear Barry, Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it. Thanks, Gang > -----????----- > ???: "Barry Smith" > ????: 2021-02-18 09:17:17 (???) > ???: "??" > ??: PETSc > ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG > > > > > On Feb 17, 2021, at 6:47 PM, ?? wrote: > > > > Dear PETSc team, > > > > I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)? > > > > Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes. > > 900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference. > > Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect. > > Barry > > > > > > Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you. > > > > > > Thanks, > > Gang -------------- next part -------------- A non-text attachment was scrubbed... Name: cg.out Type: application/octet-stream Size: 14940 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: groppcg.out Type: application/octet-stream Size: 15189 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pipecg.out Type: application/octet-stream Size: 15063 bytes Desc: not available URL: From bsmith at petsc.dev Wed Feb 17 20:52:11 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 17 Feb 2021 20:52:11 -0600 Subject: [petsc-users] An issue about pipelined CG and Gropp's CG In-Reply-To: <68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn> References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn> <68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn> Message-ID: <9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev> First please see https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined and verify that the MPI you are using satisfies the requirements and you have appropriate MPI environmental variables set (if needed). Then please add a stage around the actual computation to get a more useful summary. Organize your code like so ... KSPSetUp() PetscLogStagePush(a stage you created) KSPSolve() PetscLogStagePop() ... It is unclear where much of the time of your code is being spent, by adding the stage we'll have a clear picture of the time in the actual solver. There are examples of using PetscLogStagePush() in the source. With the new -log_view files you generate with these two changes we can get a handle on where the time is being spent and why the pipelining is or is not helping. Barry > On Feb 17, 2021, at 8:31 PM, ?? wrote: > > Dear Barry, > > Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it. > > > > Thanks, > Gang > > > > -----????----- > > ???: "Barry Smith" > > ????: 2021-02-18 09:17:17 (???) > > ???: "??" > > ??: PETSc > > ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG > > > > > > > > > On Feb 17, 2021, at 6:47 PM, ?? wrote: > > > > > > Dear PETSc team, > > > > > > I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)? > > > > > > Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes. > > > > 900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference. > > > > Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect. > > > > Barry > > > > > > > > > > Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you. > > > > > > > > > Thanks, > > > Gang > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 17 20:57:28 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 17 Feb 2021 20:57:28 -0600 Subject: [petsc-users] insufficient virtual memory? In-Reply-To: References: Message-ID: <263912AE-3062-43E3-BBF3-7B3E4703AB0C@petsc.dev> PETSc gets almost all its memory using the C malloc system calls so it is unlikely that this Fortran error message comes from PETSc code. My guess is that you have some Fortran arrays declared somewhere in your code that are large and require memory that is not available. Barry > On Feb 17, 2021, at 7:23 PM, Park, Heeho via petsc-users wrote: > > Hi PETSc developers, > > Have you seen this error message? > > forrtl: severe (41): insufficient virtual memory > > We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is failing with the error message on our HPC systems. > Ironically, it runs on our laptop (super slow.) > > type: seqbaij > rows=46251272, cols=46251272 > total: nonzeros=323046210, allocated nonzeros=323046210 > total number of mallocs used during MatSetValues calls=0 > block size is 1 > > Does anyone have experience encountering this problem? > > Thanks, > > Heeho Daniel Park > > ! ------------------------------------ ! > Sandia National Laboratories > Org: 08844, R&D > Work: 505-844-1319 > ! ------------------------------------ ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhaog6 at lsec.cc.ac.cn Wed Feb 17 22:31:42 2021 From: zhaog6 at lsec.cc.ac.cn (=?UTF-8?B?6LW15Yia?=) Date: Thu, 18 Feb 2021 12:31:42 +0800 (GMT+08:00) Subject: [petsc-users] An issue about pipelined CG and Gropp's CG In-Reply-To: <9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev> References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn> <68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn> <9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev> Message-ID: <1b8b517f.a06.177b3687191.Coremail.zhaog6@lsec.cc.ac.cn> Dear Barry, Thank you. For MPI, MVAPICH-2.3.5 is used on my cluster by default, I add PetscLogStagePush("Calling KSPSolve()...") and PetscLogStagePop(). I am using other numerical software and have called PETSc only when solving linear system through PETSc interface supported by the software, but I'm not sure if I have added it correctly. I put the result and info into attachment, please check it. Thanks, Gang -----????----- ???:"Barry Smith" ????:2021-02-18 10:52:11 (???) ???: "??" ??: PETSc ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG First please see https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined and verify that the MPI you are using satisfies the requirements and you have appropriate MPI environmental variables set (if needed). Then please add a stage around the actual computation to get a more useful summary. Organize your code like so ... KSPSetUp() PetscLogStagePush(a stage you created) KSPSolve() PetscLogStagePop() ... It is unclear where much of the time of your code is being spent, by adding the stage we'll have a clear picture of the time in the actual solver. There are examples of using PetscLogStagePush() in the source. With the new -log_view files you generate with these two changes we can get a handle on where the time is being spent and why the pipelining is or is not helping. Barry On Feb 17, 2021, at 8:31 PM, ?? wrote: Dear Barry, Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it. Thanks, Gang > -----????----- > ???: "Barry Smith" > ????: 2021-02-18 09:17:17 (???) > ???: "??" > ??: PETSc > ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG > > > > > On Feb 17, 2021, at 6:47 PM, ?? wrote: > > > > Dear PETSc team, > > > > I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)? > > > > Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes. > > 900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference. > > Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect. > > Barry > > > > > > Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you. > > > > > > Thanks, > > Gang -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cg.out Type: application/octet-stream Size: 14587 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: groppcg.out Type: application/octet-stream Size: 14835 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pipecg.out Type: application/octet-stream Size: 14712 bytes Desc: not available URL: From bsmith at petsc.dev Wed Feb 17 23:09:43 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 17 Feb 2021 23:09:43 -0600 Subject: [petsc-users] An issue about pipelined CG and Gropp's CG In-Reply-To: <1b8b517f.a06.177b3687191.Coremail.zhaog6@lsec.cc.ac.cn> References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn> <68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn> <9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev> <1b8b517f.a06.177b3687191.Coremail.zhaog6@lsec.cc.ac.cn> Message-ID: Here are the important operations from the -log_view (use a fixed sized font for easy reading). No pipeline ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ MatMult 5398 1.0 9.4707e+0012.6 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00 3 52100100 0 10 52100100 0 124335 VecTDot 10796 1.0 1.4993e+01 8.3 3.23e+08 1.1 0.0e+00 0.0e+00 1.1e+04 16 16 0 0 67 55 16 0 0 67 24172 VecNorm 5399 1.0 6.2343e+00 4.4 1.61e+08 1.1 0.0e+00 0.0e+00 5.4e+03 10 8 0 0 33 33 8 0 0 33 29073 VecAXPY 10796 1.0 1.1721e-01 1.4 3.23e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 16 0 0 0 1 16 0 0 0 3092074 VecAYPX 5397 1.0 5.4340e-02 1.4 1.61e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 3334231 VecScatterBegin 5398 1.0 5.4152e-02 3.3 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 5398 1.0 8.6881e+00489.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 6 0 0 0 0 0 KSPSolve 1 1.0 1.7389e+01 1.0 2.02e+09 1.1 3.6e+07 6.9e+02 1.6e+04 29100100100100 100100100100100 130242 Gropp pipeline MatMult 5399 1.0 9.5593e+0011.7 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00 3 45100100 0 7 45100100 0 123207 VecNorm 1 1.0 8.8549e-0417.4 2.99e+04 1.1 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 4 0 0 0 0 20 37912 VecAXPY 16194 1.0 1.6522e-01 1.4 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 3290407 VecAYPX 10794 1.0 1.9903e-01 1.5 3.23e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 14 0 0 0 1 14 0 0 0 1820606 VecScatterBegin 5399 1.0 6.2281e-02 3.6 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 5399 1.0 8.7194e+00380.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 4 0 0 0 0 0 VecReduceArith 16195 1.0 2.2674e-01 3.7 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 2397678 VecReduceBegin 10797 1.0 3.4089e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceEnd 10797 1.0 2.6197e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 37 0 0 0 0 91 0 0 0 0 0 SFBcastOpBegin 5399 1.0 6.0051e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 SFBcastOpEnd 5399 1.0 8.7167e+00440.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 4 0 0 0 0 0 KSPSolve 1 1.0 2.7477e+01 1.0 2.34e+09 1.1 3.6e+07 6.9e+02 1.0e+00 41100100100 4 100100100100 20 95623 pipeline cg MatMult 5400 1.0 1.5915e+00 1.8 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00 2 37100100 0 6 37100100 0 740161 VecAXPY 21592 1.0 2.3194e-01 1.4 6.45e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 23 0 0 0 1 23 0 0 0 3125164 VecAYPX 21588 1.0 5.5059e-01 1.7 6.45e+08 1.1 0.0e+00 0.0e+00 0.0e+00 1 23 0 0 0 2 23 0 0 0 1316272 VecScatterBegin 5400 1.0 7.0132e-02 3.7 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 5400 1.0 6.5329e-0122.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecReduceArith 16197 1.0 3.1135e-01 4.7 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 17 0 0 0 1 17 0 0 0 1746339 VecReduceBegin 5400 1.0 3.1471e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceEnd 5400 1.0 1.7226e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 28 0 0 0 0 90 0 0 0 0 0 SFBcastOpBegin 5400 1.0 6.6228e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 SFBcastOpEnd 5400 1.0 6.5000e-0124.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 KSPSolve 1 1.0 1.8893e+01 1.0 2.82e+09 1.1 3.6e+07 6.9e+02 0.0e+00 32100100100 0 100100100100 0 167860 With pipelined methods the TDot and Vec norm are replaced with VecReduceArith, VecReduceBegin, and VecReduceEnd. The important numbers are the %T in the stage. In particular look at VecTDot and VecNorm and compare to VecReduceEnd in the pipeline methods. Note that both pipelined methods, especially the gropp method spend an enormous time in VecReduceEnd and hence end up taking more time than the non-pipelined method. So basically any advantage the pipeline methods may have is lost waiting for the previous reduction operation to arrive. I do not know why, if it is the MPI implementation or something else. If you are serious about understanding pipeline methods for Krylov methods you will need to dig deep into the details of the machine hardware and MPI software. It is not a trivial subject with easy answers. I would say that the PETSc community are not experts on the topic, you will need to read in detail the publications on pipelined methods and consult with the authors on technical, machine specific details. There is a difference between the academic "pipelining as a theoretical construct" and actually dramatic improvement on real machines while using pipelining. One small implementation detail can dramatically change performance so theoretical papers alone are not the complete story. Barry ------------------------------------------------------------------------------------------------------------------------ > On Feb 17, 2021, at 10:31 PM, ?? wrote: > > Dear Barry, > > > > Thank you. For MPI, MVAPICH-2.3.5 is used on my cluster by default, I add PetscLogStagePush("Calling KSPSolve()...") and PetscLogStagePop(). I am using other numerical software and have called PETSc only when solving linear system through PETSc interface supported by the software, but I'm not sure if I have added it correctly. I put the result and info into attachment, please check it. > > > > > > Thanks, > > Gang > > > > -----????----- > ???:"Barry Smith" > ????:2021-02-18 10:52:11 (???) > ???: "??" > ??: PETSc > ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG > > > First please see https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined and verify that the MPI you are using satisfies the requirements and you have appropriate MPI environmental variables set (if needed). > > > Then please add a stage around the actual computation to get a more useful summary. > > Organize your code like so > > ... > KSPSetUp() > PetscLogStagePush(a stage you created) > KSPSolve() > PetscLogStagePop() > ... > > It is unclear where much of the time of your code is being spent, by adding the stage we'll have a clear picture of the time in the actual solver. There are examples of using PetscLogStagePush() in the source. > > With the new -log_view files you generate with these two changes we can get a handle on where the time is being spent and why the pipelining is or is not helping. > > Barry > >> On Feb 17, 2021, at 8:31 PM, ?? > wrote: >> >> Dear Barry, >> >> Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it. >> >> >> >> Thanks, >> Gang >> >> >> > -----????----- >> > ???: "Barry Smith" > >> > ????: 2021-02-18 09:17:17 (???) >> > ???: "??" > >> > ??: PETSc > >> > ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG >> > >> > >> > >> > > On Feb 17, 2021, at 6:47 PM, ?? > wrote: >> > > >> > > Dear PETSc team, >> > > >> > > I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)? >> > > >> > > Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes. >> > >> > 900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference. >> > >> > Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect. >> > >> > Barry >> > >> > >> > > >> > > Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you. >> > > >> > > >> > > Thanks, >> > > Gang >> >>>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhaog6 at lsec.cc.ac.cn Thu Feb 18 00:38:22 2021 From: zhaog6 at lsec.cc.ac.cn (=?UTF-8?B?6LW15Yia?=) Date: Thu, 18 Feb 2021 14:38:22 +0800 (GMT+08:00) Subject: [petsc-users] An issue about pipelined CG and Gropp's CG In-Reply-To: References: <7bb9aad8.144.177b29b6517.Coremail.zhaog6@lsec.cc.ac.cn> <68432bab.55d.177b2fa288a.Coremail.zhaog6@lsec.cc.ac.cn> <9BFA8477-4B4C-440D-9CB0-2B22352EFD77@petsc.dev> <1b8b517f.a06.177b3687191.Coremail.zhaog6@lsec.cc.ac.cn> Message-ID: <662101ed.d68.177b3dc6a83.Coremail.zhaog6@lsec.cc.ac.cn> Thank you a lot for your analysis and suggestions, I quite agree with your opinion for the difference of theoretical and actual. I'll try to change into MPICH-3.4 rather than MVAPICH-2.3.5 I've used before. Thanks, Gang -----????----- ???:"Barry Smith" ????:2021-02-18 13:09:43 (???) ???: "??" ??: PETSc ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG Here are the important operations from the -log_view (use a fixed sized font for easy reading). No pipeline ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ MatMult 5398 1.0 9.4707e+0012.6 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00 3 52100100 0 10 52100100 0 124335 VecTDot 10796 1.0 1.4993e+01 8.3 3.23e+08 1.1 0.0e+00 0.0e+00 1.1e+04 16 16 0 0 67 55 16 0 0 67 24172 VecNorm 5399 1.0 6.2343e+00 4.4 1.61e+08 1.1 0.0e+00 0.0e+00 5.4e+03 10 8 0 0 33 33 8 0 0 33 29073 VecAXPY 10796 1.0 1.1721e-01 1.4 3.23e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 16 0 0 0 1 16 0 0 0 3092074 VecAYPX 5397 1.0 5.4340e-02 1.4 1.61e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 3334231 VecScatterBegin 5398 1.0 5.4152e-02 3.3 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 5398 1.0 8.6881e+00489.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 6 0 0 0 0 0 KSPSolve 1 1.0 1.7389e+01 1.0 2.02e+09 1.1 3.6e+07 6.9e+02 1.6e+04 29100100100100 100100100100100 130242 Gropp pipeline MatMult 5399 1.0 9.5593e+0011.7 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00 3 45100100 0 7 45100100 0 123207 VecNorm 1 1.0 8.8549e-0417.4 2.99e+04 1.1 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 4 0 0 0 0 20 37912 VecAXPY 16194 1.0 1.6522e-01 1.4 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 3290407 VecAYPX 10794 1.0 1.9903e-01 1.5 3.23e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 14 0 0 0 1 14 0 0 0 1820606 VecScatterBegin 5399 1.0 6.2281e-02 3.6 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 5399 1.0 8.7194e+00380.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 4 0 0 0 0 0 VecReduceArith 16195 1.0 2.2674e-01 3.7 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 2397678 VecReduceBegin 10797 1.0 3.4089e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceEnd 10797 1.0 2.6197e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 37 0 0 0 0 91 0 0 0 0 0 SFBcastOpBegin 5399 1.0 6.0051e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 SFBcastOpEnd 5399 1.0 8.7167e+00440.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 4 0 0 0 0 0 KSPSolve 1 1.0 2.7477e+01 1.0 2.34e+09 1.1 3.6e+07 6.9e+02 1.0e+00 41100100100 4 100100100100 20 95623 pipeline cg MatMult 5400 1.0 1.5915e+00 1.8 1.05e+09 1.1 3.6e+07 6.9e+02 0.0e+00 2 37100100 0 6 37100100 0 740161 VecAXPY 21592 1.0 2.3194e-01 1.4 6.45e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 23 0 0 0 1 23 0 0 0 3125164 VecAYPX 21588 1.0 5.5059e-01 1.7 6.45e+08 1.1 0.0e+00 0.0e+00 0.0e+00 1 23 0 0 0 2 23 0 0 0 1316272 VecScatterBegin 5400 1.0 7.0132e-02 3.7 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 5400 1.0 6.5329e-0122.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecReduceArith 16197 1.0 3.1135e-01 4.7 4.84e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 17 0 0 0 1 17 0 0 0 1746339 VecReduceBegin 5400 1.0 3.1471e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceEnd 5400 1.0 1.7226e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 28 0 0 0 0 90 0 0 0 0 0 SFBcastOpBegin 5400 1.0 6.6228e-02 4.1 0.00e+00 0.0 3.6e+07 6.9e+02 0.0e+00 0 0100100 0 0 0100100 0 0 SFBcastOpEnd 5400 1.0 6.5000e-0124.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 KSPSolve 1 1.0 1.8893e+01 1.0 2.82e+09 1.1 3.6e+07 6.9e+02 0.0e+00 32100100100 0 100100100100 0 167860 With pipelined methods the TDot and Vec norm are replaced with VecReduceArith, VecReduceBegin, and VecReduceEnd. The important numbers are the %T in the stage. In particular look at VecTDot and VecNorm and compare to VecReduceEnd in the pipeline methods. Note that both pipelined methods, especially the gropp method spend an enormous time in VecReduceEnd and hence end up taking more time than the non-pipelined method. So basically any advantage the pipeline methods may have is lost waiting for the previous reduction operation to arrive. I do not know why, if it is the MPI implementation or something else. If you are serious about understanding pipeline methods for Krylov methods you will need to dig deep into the details of the machine hardware and MPI software. It is not a trivial subject with easy answers. I would say that the PETSc community are not experts on the topic, you will need to read in detail the publications on pipelined methods and consult with the authors on technical, machine specific details. There is a difference between the academic "pipelining as a theoretical construct" and actually dramatic improvement on real machines while using pipelining. One small implementation detail can dramatically change performance so theoretical papers alone are not the complete story. Barry ------------------------------------------------------------------------------------------------------------------------ On Feb 17, 2021, at 10:31 PM, ?? wrote: Dear Barry, Thank you. For MPI, MVAPICH-2.3.5 is used on my cluster by default, I add PetscLogStagePush("Calling KSPSolve()...") and PetscLogStagePop(). I am using other numerical software and have called PETSc only when solving linear system through PETSc interface supported by the software, but I'm not sure if I have added it correctly. I put the result and info into attachment, please check it. Thanks, Gang -----????----- ???:"Barry Smith" ????:2021-02-18 10:52:11 (???) ???: "??" ??: PETSc ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG First please see https://www.mcs.anl.gov/petsc/documentation/faq.html#pipelined and verify that the MPI you are using satisfies the requirements and you have appropriate MPI environmental variables set (if needed). Then please add a stage around the actual computation to get a more useful summary. Organize your code like so ... KSPSetUp() PetscLogStagePush(a stage you created) KSPSolve() PetscLogStagePop() ... It is unclear where much of the time of your code is being spent, by adding the stage we'll have a clear picture of the time in the actual solver. There are examples of using PetscLogStagePush() in the source. With the new -log_view files you generate with these two changes we can get a handle on where the time is being spent and why the pipelining is or is not helping. Barry On Feb 17, 2021, at 8:31 PM, ?? wrote: Dear Barry, Thank you for your prompt reply. I run ~16M DOFs on 32 nodes (36 cores per node), but CG seems to be faster than pipelined CG and Gropp's CG, I'm puzzled and haven't figured out why. Put the performance output into attachment, please check it. Thanks, Gang > -----????----- > ???: "Barry Smith" > ????: 2021-02-18 09:17:17 (???) > ???: "??" > ??: PETSc > ??: Re: [petsc-users] An issue about pipelined CG and Gropp's CG > > > > > On Feb 17, 2021, at 6:47 PM, ?? wrote: > > > > Dear PETSc team, > > > > I am interested in pipelined CG (-ksp_type pipecg) and Gropp's CG (-ksp_type groppcg), it is expected that this iterative method with pipelined has advantages over traditional CG in the case of multiple processes. So I'd like to ask for Poisson problem, how many computing nodes do I need to show the advantages of pipelined CG or Gropp's CG over CG (No preconditioner is used)? > > > > Currently, I can only use up to 32 nodes (36 cores per nodes) at most on my cluster, but both "pipecg" and "groppcg" seem to be no advantage over "cg" when I solve Poisson equations with homogeneous Dirichlet BC in [0, 1]^2 (remain 20K~60K DOFs per process). I guess the reason would be too few computing nodes. > > 900 cores (assuming they are not memory bandwidth bound) might be enough to see some differences but the differences are likely so small compared to other parallel issues that affect performance that you see no consistently measurable difference. > > Run with -log_view three cases, no pipeline and the two pipelines and send the output. By studying where the time is spent in the different regions of the code with this output one may be able to say something about the pipeline affect. > > Barry > > > > > > Because I am calling PETSc via other numerical software, if need, I would mail related performance information to you by using command line options suggested by PETSc. Thank you. > > > > > > Thanks, > > Gang -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.richter at ntnu.no Thu Feb 18 02:09:35 2021 From: roland.richter at ntnu.no (Roland Richter) Date: Thu, 18 Feb 2021 09:09:35 +0100 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> <87y2fmbpw2.fsf@jedbrown.org> <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> Message-ID: <50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no> Hei, that was the reason for increased run times. When removing #pragma omp parallel for, my loop took ~18 seconds. When changing it to #pragma omp parallel for num_threads(2) or #pragma omp parallel for num_threads(4) (on a i7-6700), the loop took ~16 s, but when increasing it to #pragma omp parallel for num_threads(8), the loop took 28 s. Regards, Roland Am 17.02.21 um 18:51 schrieb Matthew Knepley: > Jed, is it possible that this is an oversubscription penalty from bad > OpenMP settings? cuneiform> > > ? Thanks, > > ? ? ?Matt > > On Wed, Feb 17, 2021 at 12:11 PM Roland Richter > > wrote: > > My PetscScalar is complex double (i.e. even higher penalty), but > my matrix has a size of 8kk elements, so that should not an issue. > Regards, > Roland > ------------------------------------------------------------------------ > *Von:* Jed Brown > > *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49 > *An:* Roland Richter; PETSc > *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in > performance drop and wrong results > ? > Roland Richter > writes: > > > Hei, > > > > I replaced the linking line with > > > > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx? -march=native -fopenmp-simd > > -DMKL_LP64 -m64 > > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o > > bin/armadillo_with_PETSc? > > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib > > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran? > > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 > > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl > > /opt/boost/lib/libboost_filesystem.so.1.72.0 > > /opt/boost/lib/libboost_mpi.so.1.72.0 > > /opt/boost/lib/libboost_program_options.so.1.72.0 > > /opt/boost/lib/libboost_serialization.so.1.72.0 > > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so > > /opt/petsc_release/lib/libpetsc.so > > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so > > / > > > > and now the results are correct. Nevertheless, when comparing > the loop > > in line 26-28 in file test_scaling.cpp > > > > /#pragma omp parallel for// > > //??? for(int i = 0; i < r_0 * r_1; ++i)// > > //??? ??? *(out_mat_ptr + i) = (*(in_mat_ptr + i) * > scaling_factor);/ > > > > the version without /#pragma omp parallel/ for is significantly > faster > > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there > > still such a big difference? > > Sounds like you're using a profile to attribute time? Each `omp > parallel` region incurs a cost ranging from about a microsecond to > 10 or more microseconds depending on architecture, number of > threads, and OpenMP implementation. Your loop (for double > precision) operates at around 8 entries per clock cycle (depending > on architecture) if the operands are in cache so the loop size r_0 > * r_1 should be at least 10000 just to pay off the cost of `omp > parallel`. > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e0425375 at gmail.com Thu Feb 18 06:00:58 2021 From: e0425375 at gmail.com (Florian Bruckner) Date: Thu, 18 Feb 2021 13:00:58 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev> <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es> <80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es> Message-ID: Dear Jose, thanks for your work. I just looked over the code, but I didn't have time to implement our solver, yet. If I understand the code correctly, it allows to set a precond-matrix which should approximate A-sigma*B. I will try to get our code running in the next few weeks. From user perspective it would maybe simplify things if approximations for A as well as B are given, since this would hide the internal ST transformations. best wishes Florian On Tue, Feb 16, 2021 at 8:54 PM Jose E. Roman wrote: > Florian: I have created a MR > https://gitlab.com/slepc/slepc/-/merge_requests/149 > Let me know if it fits your needs. > > Jose > > > > El 15 feb 2021, a las 18:44, Jose E. Roman > escribi?: > > > > > > > >> El 15 feb 2021, a las 14:53, Matthew Knepley > escribi?: > >> > >> On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman > wrote: > >> I will think about the viability of adding an interface function to > pass the preconditioner matrix. > >> > >> Regarding the question about the B-orthogonality of computed vectors, > in the symmetric solver the B-orthogonality is enforced during the > computation, so you have guarantee that the computed vectors satisfy it. > But if solved as non-symetric, the computed vectors may depart from > B-orthogonality, unless the tolerance is very small. > >> > >> Yes, the vectors I generate are not B-orthogonal. > >> > >> Jose, do you think there is a way to reformulate what I am doing to use > the symmetric solver, even if we only have the action of B? > > > > Yes, you can do the following: > > > > ierr = EPSSetOperators(eps,S,NULL);CHKERRQ(ierr); // S is your shell > matrix A^{-1}*B > > ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); // symmetric > problem though S is not symmetric > > ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); > > ierr = EPSSetUp(eps);CHKERRQ(ierr); // note explicitly calling setup > here > > ierr = EPSGetBV(eps,&bv);CHKERRQ(ierr); > > ierr = BVSetMatrix(bv,B,PETSC_FALSE);CHKERRQ(ierr); // replace > solver's inner product > > ierr = EPSSolve(eps);CHKERRQ(ierr); > > > > I have tried this with test1.c and it works. The computed eigenvectors > should be B-orthogonal in this case. > > > > Jose > > > > > >> > >> Thanks, > >> > >> Matt > >> > >> Jose > >> > >> > >>> El 14 feb 2021, a las 21:41, Barry Smith escribi?: > >>> > >>> > >>> Florian, > >>> > >>> I'm sorry I don't know the answers; I can only speculate. There is a > STGetShift(). > >>> > >>> All I was saying is theoretically there could/should be such support > in SLEPc. > >>> > >>> Barry > >>> > >>> > >>>> On Feb 13, 2021, at 6:43 PM, Florian Bruckner > wrote: > >>>> > >>>> Dear Barry, > >>>> thank you for your clarification. What I wanted to say is that even > if I could reset the KSP operators directly I would require to know which > transformation ST applies in order to provide the preconditioning matrix > for the correct operator. > >>>> The more general solution would be that SLEPc provides the interface > to pass the preconditioning matrix for A0 and ST applies the same > transformations as for the operator. > >>>> > >>>> If you write "SLEPc could provide an interface", do you mean someone > should implement it, or should it already be possible and I am not using it > correctly? > >>>> I wrote a small standalone example based on ex9.py from slepc4py, > where i tried to use an operator. > >>>> > >>>> best wishes > >>>> Florian > >>>> > >>>> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith wrote: > >>>> > >>>> > >>>>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet wrote: > >>>>> > >>>>> > >>>>> > >>>>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner > wrote: > >>>>>> > >>>>>> Dear Jose, Dear Barry, > >>>>>> thanks again for your reply. One final question about the B0 > orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but > they are i*B0 orthogonal? or is there an issue with Matt's approach? > >>>>>> For my problem I can show that eigenvalues fulfill an orthogonality > relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This > should be independent of the solving method, right? > >>>>>> > >>>>>> Regarding Barry's advice this is what I first tried: > >>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > >>>>>> st = es.getST() > >>>>>> ksp = st.getKSP() > >>>>>> ksp.setOperators(self.A0, self.P0) > >>>>>> > >>>>>> But it seems that the provided P0 is not used. Furthermore the > interface is maybe a bit confusing if ST performs some transformation. In > this case P0 needs to approximate A0^{-1}*B0 and not A0, right? > >>>>> > >>>>> No, you need to approximate (A0-sigma B0)^-1. If you have a null > shift, which looks like it is the case, you end up with A0^-1. > >>>> > >>>> Just trying to provide more clarity with the terms. > >>>> > >>>> If ST transforms the operator in the KSP to (A0-sigma B0) and you are > providing the "sparse matrix from which the preconditioner is to be built" > then you need to provide something that approximates (A0-sigma B0). Since > the PC will use your matrix to construct a preconditioner that approximates > the inverse of (A0-sigma B0), you don't need to directly provide something > that approximates (A0-sigma B0)^-1 > >>>> > >>>> Yes, I would think SLEPc could provide an interface where it manages > "the matrix from which to construct the preconditioner" and transforms that > matrix just like the true matrix. To do it by hand you simply need to know > what A0 and B0 are and which sigma ST has selected and then you can > construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and > modB0 are your "sparser approximations". > >>>> > >>>> Barry > >>>> > >>>> > >>>>> > >>>>>> Nevertheless I think it would be the best solution if one could > provide P0 (approx A0) and SLEPc derives the preconditioner from this. > Would this be hard to implement? > >>>>> > >>>>> This is what Barry?s suggestion is implementing. Don?t know why it > doesn?t work with your Python operator though. > >>>>> > >>>>> Thanks, > >>>>> Pierre > >>>>> > >>>>>> best wishes > >>>>>> Florian > >>>>>> > >>>>>> > >>>>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith > wrote: > >>>>>> > >>>>>> > >>>>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner > wrote: > >>>>>>> > >>>>>>> Dear Jose, Dear Matt, > >>>>>>> > >>>>>>> I needed some time to think about your answers. > >>>>>>> If I understand correctly, the eigenmode solver internally uses > A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP > solver and a corresponding preconditioner. > >>>>>>> What I would need is an interface to provide not only the system > Matrix A0 (which is an operator), but also a preconditioning matrix (sparse > approximation of the operator). > >>>>>>> Unfortunately this interface is not available, right? > >>>>>> > >>>>>> If SLEPc does not provide this directly it is still intended to > be trivial to provide the "preconditioner matrix" (that is matrix from > which the preconditioner is built). Just get the KSP from the ST object and > use KSPSetOperators() to provide the "preconditioner matrix" . > >>>>>> > >>>>>> Barry > >>>>>> > >>>>>>> > >>>>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The > operator uses a KSP with a proper PC internally. SLEPc would directly get > A0^{-1}*B0 and solve a standard eigenvalue problem with this modified > operator. Did I understand this correctly? > >>>>>>> > >>>>>>> I have two further points, which I did not mention yet: the matrix > B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am > using Firedrake to set up the PETSc system matrices A0, i*B0 (which is > real). Then I convert them into ScipyLinearOperators and use > scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. > Minv=A0^-1 is also solving within scipy using a preconditioned gmres. > Advantage of this setup is that the imaginary B0 can be handled efficiently > and also the post-processing of the eigenvectors (which requires complex > arithmetics) is simplified. > >>>>>>> > >>>>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too > complicated and is not very flexible. > >>>>>>> If I would use Matt's approach, could I then simply switch between > multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to > the use of matshell? > >>>>>>> Is there a solution for the imaginary B0, or do I have to use the > non-hermitian methods? Is this a large performance drawback? > >>>>>>> > >>>>>>> thanks again, > >>>>>>> and best wishes > >>>>>>> Florian > >>>>>>> > >>>>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman > wrote: > >>>>>>> The problem can be written as A0*v=omega*B0*v and you want the > eigenvalues omega closest to zero. If the matrices were explicitly > available, you would do shift-and-invert with target=0, that is > >>>>>>> > >>>>>>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is > >>>>>>> > >>>>>>> A0^{-1}*B0*v=theta*v > >>>>>>> > >>>>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. > >>>>>>> > >>>>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of > EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? > EPS_SMALLEST_REAL will give slow convergence. > >>>>>>> > >>>>>>> Florian: I would not recommend setting the KSP matrices directly, > it may produce strange side-effects. We should have an interface function > to pass this matrix. Currently there is STPrecondSetMatForPC() but it has > two problems: (1) it is intended for STPRECOND, so cannot be used with > Krylov-Schur, and (2) it is not currently available in the python interface. > >>>>>>> > >>>>>>> The approach used by Matt is a workaround that does not use ST, so > you can handle linear solves with a KSP of your own. > >>>>>>> > >>>>>>> As an alternative, since your problem is symmetric, you could try > LOBPCG, assuming that the leftmost eigenvalues are those that you want > (e.g. if all eigenvalues are non-negative). In that case you could use > STPrecondSetMatForPC(), but the remaining issue is calling it from python. > >>>>>>> > >>>>>>> If you are using the git repo, I could add the relevant code. > >>>>>>> > >>>>>>> Jose > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> El 8 feb 2021, a las 14:22, Matthew Knepley > escribi?: > >>>>>>>> > >>>>>>>> On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner < > e0425375 at gmail.com> wrote: > >>>>>>>> Dear PETSc / SLEPc Users, > >>>>>>>> > >>>>>>>> my question is very similar to the one posted here: > >>>>>>>> > https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > >>>>>>>> > >>>>>>>> The eigensystem I would like to solve looks like: > >>>>>>>> B0 v = 1/omega A0 v > >>>>>>>> B0 and A0 are both hermitian, A0 is positive definite, but only > given as a linear operator (matshell). I am looking for the largest > eigenvalues (=smallest omega). > >>>>>>>> > >>>>>>>> I also have a sparse approximation P0 of the A0 operator, which i > would like to use as precondtioner, using something like this: > >>>>>>>> > >>>>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > >>>>>>>> st = es.getST() > >>>>>>>> ksp = st.getKSP() > >>>>>>>> ksp.setOperators(self.A0, self.P0) > >>>>>>>> > >>>>>>>> Unfortunately PETSc still complains that it cannot create a > preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but > A0.type == 'python'). > >>>>>>>> By the way, should P0 be an approximation of A0 or does it have > to include B0? > >>>>>>>> > >>>>>>>> Right now I am using the krylov-schur method. Are there any > alternatives if A0 is only given as an operator? > >>>>>>>> > >>>>>>>> Jose can correct me if I say something wrong. > >>>>>>>> > >>>>>>>> When I did this, I made a shell operator for the action of > A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 > preconditioning matrix, and > >>>>>>>> then handed that to EPS. You can see me do it here: > >>>>>>>> > >>>>>>>> > https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > >>>>>>>> > >>>>>>>> I had a hard time getting the embedded solver to work the way I > wanted, but maybe that is the better way. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> Matt > >>>>>>>> > >>>>>>>> thanks for any advice > >>>>>>>> best wishes > >>>>>>>> Florian > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>>>>>> -- Norbert Wiener > >>>>>>>> > >>>>>>>> https://www.cse.buffalo.edu/~knepley/ > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 18 06:10:17 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 18 Feb 2021 07:10:17 -0500 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: <50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no> References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> <87y2fmbpw2.fsf@jedbrown.org> <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> <50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no> Message-ID: On Thu, Feb 18, 2021 at 3:09 AM Roland Richter wrote: > Hei, > > that was the reason for increased run times. When removing #pragma omp > parallel for, my loop took ~18 seconds. When changing it to #pragma omp > parallel for num_threads(2) or #pragma omp parallel for num_threads(4) (on > a i7-6700), the loop took ~16 s, but when increasing it to #pragma omp > parallel for num_threads(8), the loop took 28 s. > > Editorial: This is a reason I think OpenMP is inappropriate as a tool for parallel computing (many people disagree). It makes resource management difficult for the user and impossible for a library. Thanks, Matt > Regards, > > Roland > Am 17.02.21 um 18:51 schrieb Matthew Knepley: > > Jed, is it possible that this is an oversubscription penalty from bad > OpenMP settings? cuneiform> > > Thanks, > > Matt > > On Wed, Feb 17, 2021 at 12:11 PM Roland Richter > wrote: > >> My PetscScalar is complex double (i.e. even higher penalty), but my >> matrix has a size of 8kk elements, so that should not an issue. >> Regards, >> Roland >> ------------------------------ >> *Von:* Jed Brown >> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49 >> *An:* Roland Richter; PETSc >> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in >> performance drop and wrong results >> >> Roland Richter writes: >> >> > Hei, >> > >> > I replaced the linking line with >> > >> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -march=native -fopenmp-simd >> > -DMKL_LP64 -m64 >> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o >> > bin/armadillo_with_PETSc >> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib >> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran >> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 >> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl >> > /opt/boost/lib/libboost_filesystem.so.1.72.0 >> > /opt/boost/lib/libboost_mpi.so.1.72.0 >> > /opt/boost/lib/libboost_program_options.so.1.72.0 >> > /opt/boost/lib/libboost_serialization.so.1.72.0 >> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so >> > /opt/petsc_release/lib/libpetsc.so >> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so >> > / >> > >> > and now the results are correct. Nevertheless, when comparing the loop >> > in line 26-28 in file test_scaling.cpp >> > >> > /#pragma omp parallel for// >> > // for(int i = 0; i < r_0 * r_1; ++i)// >> > // *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ >> > >> > the version without /#pragma omp parallel/ for is significantly faster >> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there >> > still such a big difference? >> >> Sounds like you're using a profile to attribute time? Each `omp parallel` >> region incurs a cost ranging from about a microsecond to 10 or more >> microseconds depending on architecture, number of threads, and OpenMP >> implementation. Your loop (for double precision) operates at around 8 >> entries per clock cycle (depending on architecture) if the operands are in >> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off >> the cost of `omp parallel`. >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Feb 18 06:25:32 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 18 Feb 2021 13:25:32 +0100 Subject: [petsc-users] using preconditioner with SLEPc In-Reply-To: References: <7C5B30FE-C539-4A14-B442-B1C91618E4AC@petsc.dev> <119944FD-4F1E-4B2F-A39D-65ADDB12BB5F@petsc.dev> <6EF7889D-DC17-46FC-82A5-9409C41E231D@petsc.dev> <46C744D7-4376-46B3-B5C4-211A4C8C2291@dsic.upv.es> <80BCEEDC-4C1E-4512-AAF5-7B6E718C7D1D@dsic.upv.es> Message-ID: Yes, it is an approximation of A-sigma*B, doing a different thing would be too disruptive. I will merge into master what is currently in the merge request. Jose > El 18 feb 2021, a las 13:00, Florian Bruckner escribi?: > > Dear Jose, > thanks for your work. I just looked over the code, but I didn't have time to implement our solver, yet. > If I understand the code correctly, it allows to set a precond-matrix which should approximate A-sigma*B. > > I will try to get our code running in the next few weeks. From user perspective it would maybe simplify things if approximations for A as well as B are given, since this would hide the internal ST transformations. > > best wishes > Florian > > On Tue, Feb 16, 2021 at 8:54 PM Jose E. Roman wrote: > Florian: I have created a MR https://gitlab.com/slepc/slepc/-/merge_requests/149 > Let me know if it fits your needs. > > Jose > > > > El 15 feb 2021, a las 18:44, Jose E. Roman escribi?: > > > > > > > >> El 15 feb 2021, a las 14:53, Matthew Knepley escribi?: > >> > >> On Mon, Feb 15, 2021 at 7:27 AM Jose E. Roman wrote: > >> I will think about the viability of adding an interface function to pass the preconditioner matrix. > >> > >> Regarding the question about the B-orthogonality of computed vectors, in the symmetric solver the B-orthogonality is enforced during the computation, so you have guarantee that the computed vectors satisfy it. But if solved as non-symetric, the computed vectors may depart from B-orthogonality, unless the tolerance is very small. > >> > >> Yes, the vectors I generate are not B-orthogonal. > >> > >> Jose, do you think there is a way to reformulate what I am doing to use the symmetric solver, even if we only have the action of B? > > > > Yes, you can do the following: > > > > ierr = EPSSetOperators(eps,S,NULL);CHKERRQ(ierr); // S is your shell matrix A^{-1}*B > > ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr); // symmetric problem though S is not symmetric > > ierr = EPSSetFromOptions(eps);CHKERRQ(ierr); > > ierr = EPSSetUp(eps);CHKERRQ(ierr); // note explicitly calling setup here > > ierr = EPSGetBV(eps,&bv);CHKERRQ(ierr); > > ierr = BVSetMatrix(bv,B,PETSC_FALSE);CHKERRQ(ierr); // replace solver's inner product > > ierr = EPSSolve(eps);CHKERRQ(ierr); > > > > I have tried this with test1.c and it works. The computed eigenvectors should be B-orthogonal in this case. > > > > Jose > > > > > >> > >> Thanks, > >> > >> Matt > >> > >> Jose > >> > >> > >>> El 14 feb 2021, a las 21:41, Barry Smith escribi?: > >>> > >>> > >>> Florian, > >>> > >>> I'm sorry I don't know the answers; I can only speculate. There is a STGetShift(). > >>> > >>> All I was saying is theoretically there could/should be such support in SLEPc. > >>> > >>> Barry > >>> > >>> > >>>> On Feb 13, 2021, at 6:43 PM, Florian Bruckner wrote: > >>>> > >>>> Dear Barry, > >>>> thank you for your clarification. What I wanted to say is that even if I could reset the KSP operators directly I would require to know which transformation ST applies in order to provide the preconditioning matrix for the correct operator. > >>>> The more general solution would be that SLEPc provides the interface to pass the preconditioning matrix for A0 and ST applies the same transformations as for the operator. > >>>> > >>>> If you write "SLEPc could provide an interface", do you mean someone should implement it, or should it already be possible and I am not using it correctly? > >>>> I wrote a small standalone example based on ex9.py from slepc4py, where i tried to use an operator. > >>>> > >>>> best wishes > >>>> Florian > >>>> > >>>> On Sat, Feb 13, 2021 at 7:15 PM Barry Smith wrote: > >>>> > >>>> > >>>>> On Feb 13, 2021, at 2:47 AM, Pierre Jolivet wrote: > >>>>> > >>>>> > >>>>> > >>>>>> On 13 Feb 2021, at 7:25 AM, Florian Bruckner wrote: > >>>>>> > >>>>>> Dear Jose, Dear Barry, > >>>>>> thanks again for your reply. One final question about the B0 orthogonality. Do you mean that eigenvectors are not B0 orthogonal, but they are i*B0 orthogonal? or is there an issue with Matt's approach? > >>>>>> For my problem I can show that eigenvalues fulfill an orthogonality relation (phi_i, A0 phi_j ) = omega_i (phi_i, B0 phi_j) = delta_ij. This should be independent of the solving method, right? > >>>>>> > >>>>>> Regarding Barry's advice this is what I first tried: > >>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > >>>>>> st = es.getST() > >>>>>> ksp = st.getKSP() > >>>>>> ksp.setOperators(self.A0, self.P0) > >>>>>> > >>>>>> But it seems that the provided P0 is not used. Furthermore the interface is maybe a bit confusing if ST performs some transformation. In this case P0 needs to approximate A0^{-1}*B0 and not A0, right? > >>>>> > >>>>> No, you need to approximate (A0-sigma B0)^-1. If you have a null shift, which looks like it is the case, you end up with A0^-1. > >>>> > >>>> Just trying to provide more clarity with the terms. > >>>> > >>>> If ST transforms the operator in the KSP to (A0-sigma B0) and you are providing the "sparse matrix from which the preconditioner is to be built" then you need to provide something that approximates (A0-sigma B0). Since the PC will use your matrix to construct a preconditioner that approximates the inverse of (A0-sigma B0), you don't need to directly provide something that approximates (A0-sigma B0)^-1 > >>>> > >>>> Yes, I would think SLEPc could provide an interface where it manages "the matrix from which to construct the preconditioner" and transforms that matrix just like the true matrix. To do it by hand you simply need to know what A0 and B0 are and which sigma ST has selected and then you can construct your modA0 - sigma modB0 and pass it to the KSP. Where modA0 and modB0 are your "sparser approximations". > >>>> > >>>> Barry > >>>> > >>>> > >>>>> > >>>>>> Nevertheless I think it would be the best solution if one could provide P0 (approx A0) and SLEPc derives the preconditioner from this. Would this be hard to implement? > >>>>> > >>>>> This is what Barry?s suggestion is implementing. Don?t know why it doesn?t work with your Python operator though. > >>>>> > >>>>> Thanks, > >>>>> Pierre > >>>>> > >>>>>> best wishes > >>>>>> Florian > >>>>>> > >>>>>> > >>>>>> On Sat, Feb 13, 2021 at 4:19 AM Barry Smith wrote: > >>>>>> > >>>>>> > >>>>>>> On Feb 12, 2021, at 2:32 AM, Florian Bruckner wrote: > >>>>>>> > >>>>>>> Dear Jose, Dear Matt, > >>>>>>> > >>>>>>> I needed some time to think about your answers. > >>>>>>> If I understand correctly, the eigenmode solver internally uses A0^{-1}*B0, which is normally handled by the ST object, which creates a KSP solver and a corresponding preconditioner. > >>>>>>> What I would need is an interface to provide not only the system Matrix A0 (which is an operator), but also a preconditioning matrix (sparse approximation of the operator). > >>>>>>> Unfortunately this interface is not available, right? > >>>>>> > >>>>>> If SLEPc does not provide this directly it is still intended to be trivial to provide the "preconditioner matrix" (that is matrix from which the preconditioner is built). Just get the KSP from the ST object and use KSPSetOperators() to provide the "preconditioner matrix" . > >>>>>> > >>>>>> Barry > >>>>>> > >>>>>>> > >>>>>>> Matt directly creates A0^{-1}*B0 as a matshell operator. The operator uses a KSP with a proper PC internally. SLEPc would directly get A0^{-1}*B0 and solve a standard eigenvalue problem with this modified operator. Did I understand this correctly? > >>>>>>> > >>>>>>> I have two further points, which I did not mention yet: the matrix B0 is Hermitian, but it is (purely) imaginary (B0.real=0). Right now, I am using Firedrake to set up the PETSc system matrices A0, i*B0 (which is real). Then I convert them into ScipyLinearOperators and use scipy.sparse.eigsh(B0, b=A0, Minv=Minv) to calculate the eigenvalues. Minv=A0^-1 is also solving within scipy using a preconditioned gmres. Advantage of this setup is that the imaginary B0 can be handled efficiently and also the post-processing of the eigenvectors (which requires complex arithmetics) is simplified. > >>>>>>> > >>>>>>> Nevertheless I think that the mixing of PETSc and Scipy looks too complicated and is not very flexible. > >>>>>>> If I would use Matt's approach, could I then simply switch between multiple standard eigenvalue methods (e.g. LOBPCG)? or is it limited due to the use of matshell? > >>>>>>> Is there a solution for the imaginary B0, or do I have to use the non-hermitian methods? Is this a large performance drawback? > >>>>>>> > >>>>>>> thanks again, > >>>>>>> and best wishes > >>>>>>> Florian > >>>>>>> > >>>>>>> On Mon, Feb 8, 2021 at 3:37 PM Jose E. Roman wrote: > >>>>>>> The problem can be written as A0*v=omega*B0*v and you want the eigenvalues omega closest to zero. If the matrices were explicitly available, you would do shift-and-invert with target=0, that is > >>>>>>> > >>>>>>> (A0-sigma*B0)^{-1}*B0*v=theta*v for sigma=0, that is > >>>>>>> > >>>>>>> A0^{-1}*B0*v=theta*v > >>>>>>> > >>>>>>> and you compute EPS_LARGEST_MAGNITUDE eigenvalues theta=1/omega. > >>>>>>> > >>>>>>> Matt: I guess you should have EPS_LARGEST_MAGNITUDE instead of EPS_SMALLEST_REAL in your code. Are you getting the eigenvalues you need? EPS_SMALLEST_REAL will give slow convergence. > >>>>>>> > >>>>>>> Florian: I would not recommend setting the KSP matrices directly, it may produce strange side-effects. We should have an interface function to pass this matrix. Currently there is STPrecondSetMatForPC() but it has two problems: (1) it is intended for STPRECOND, so cannot be used with Krylov-Schur, and (2) it is not currently available in the python interface. > >>>>>>> > >>>>>>> The approach used by Matt is a workaround that does not use ST, so you can handle linear solves with a KSP of your own. > >>>>>>> > >>>>>>> As an alternative, since your problem is symmetric, you could try LOBPCG, assuming that the leftmost eigenvalues are those that you want (e.g. if all eigenvalues are non-negative). In that case you could use STPrecondSetMatForPC(), but the remaining issue is calling it from python. > >>>>>>> > >>>>>>> If you are using the git repo, I could add the relevant code. > >>>>>>> > >>>>>>> Jose > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> El 8 feb 2021, a las 14:22, Matthew Knepley escribi?: > >>>>>>>> > >>>>>>>> On Mon, Feb 8, 2021 at 7:04 AM Florian Bruckner wrote: > >>>>>>>> Dear PETSc / SLEPc Users, > >>>>>>>> > >>>>>>>> my question is very similar to the one posted here: > >>>>>>>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-August/035878.html > >>>>>>>> > >>>>>>>> The eigensystem I would like to solve looks like: > >>>>>>>> B0 v = 1/omega A0 v > >>>>>>>> B0 and A0 are both hermitian, A0 is positive definite, but only given as a linear operator (matshell). I am looking for the largest eigenvalues (=smallest omega). > >>>>>>>> > >>>>>>>> I also have a sparse approximation P0 of the A0 operator, which i would like to use as precondtioner, using something like this: > >>>>>>>> > >>>>>>>> es = SLEPc.EPS().create(comm=fd.COMM_WORLD) > >>>>>>>> st = es.getST() > >>>>>>>> ksp = st.getKSP() > >>>>>>>> ksp.setOperators(self.A0, self.P0) > >>>>>>>> > >>>>>>>> Unfortunately PETSc still complains that it cannot create a preconditioner for a type 'python' matrix although P0.type == 'seqaij' (but A0.type == 'python'). > >>>>>>>> By the way, should P0 be an approximation of A0 or does it have to include B0? > >>>>>>>> > >>>>>>>> Right now I am using the krylov-schur method. Are there any alternatives if A0 is only given as an operator? > >>>>>>>> > >>>>>>>> Jose can correct me if I say something wrong. > >>>>>>>> > >>>>>>>> When I did this, I made a shell operator for the action of A0^{-1} B0 which has a KSPSolve() in it, so you can use your P0 preconditioning matrix, and > >>>>>>>> then handed that to EPS. You can see me do it here: > >>>>>>>> > >>>>>>>> https://gitlab.com/knepley/bamg/-/blob/master/src/coarse/bamgCoarseSpace.c#L123 > >>>>>>>> > >>>>>>>> I had a hard time getting the embedded solver to work the way I wanted, but maybe that is the better way. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> Matt > >>>>>>>> > >>>>>>>> thanks for any advice > >>>>>>>> best wishes > >>>>>>>> Florian > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>>>>>>> -- Norbert Wiener > >>>>>>>> > >>>>>>>> https://www.cse.buffalo.edu/~knepley/ > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > From jed at jedbrown.org Thu Feb 18 10:01:18 2021 From: jed at jedbrown.org (Jed Brown) Date: Thu, 18 Feb 2021 09:01:18 -0700 Subject: [petsc-users] RSE and Postdoc openings at CU Boulder Message-ID: <87pn0x9xgx.fsf@jedbrown.org> My research group has openings for a Research Software Engineer and a Postdoc. Details and application links below; feel free to email me with questions. ## Research Software Engineer CU Boulder?s PSAAP Multidisciplinary Simulation Center for Micromorphic Multiphysics Porous and Particulate Materials Simulations Within Exascale Computing Workflows has an opening for a *Research Software Engineer* to co-lead development of robust, extensible open source software for extreme-scale simulation of large-deformation composite poro-elasto-visco-plastic media across a broad range of regimes with experimental validation and coordination with micromorphic multiscale models. Successful applicants will have strong written and verbal communication skills, and interest in working with an interdisciplinary team to apply the following to real-world problems: * collaborative software development and devops (Git, continuous integration, etc.); * maintainable, high-performance programming techniques for CPUs and GPUs; * finite element and material-point discretizations; * computational mechanics/inelasticity; * parallel algebraic solvers such as PETSc; and * scalable data-intensive computing. This position can start immediately and is remote-friendly, especially during the pandemic. https://jobs.colorado.edu/jobs/JobDetail/Research-Associate/28703 ## Postdoc We also have an immediate opening for a *Postdoc* to conduct research in collaboration with the DOE Exascale Computing Project?s co-design Center for Efficient Exascale Discretization (CEED) on the development of robust and efficient methods for high order/compatible PDE discretization and multilevel solvers, including deployment in open source libraries. The project is especially interested in strategies to provide performance portability on emerging architectures and novel parallelization techniques to improve time to solution in the strong scaling limit. The methods will be applied in a variety of applications areas including sustainable energy and geophysics. Successful applicants will have strong written and verbal communication skills to collaborate with a distributed inter-disciplinary team and disseminate results via publications and presentations, as well as an interest in research and development of high-quality community software infrastructure in areas including, but not limited to: * element-based PDE discretization; * high-performance computing on emerging architectures, including CPU and GPUs; * scalable algebraic solvers; * applications in fluid and solid mechanics; and * data-intensive PDE workflows. This position can start immediately and is remote-friendly, especially during the pandemic. https://jobs.colorado.edu/jobs/JobDetail/PostDoctoral-Associate/28691 The University of Colorado Boulder is committed to building a culturally diverse community of faculty, staff, and students dedicated to contributing to an inclusive campus environment. We are an Equal Opportunity employer. We offer a competitive salary and a comprehensive benefits package. From heepark at sandia.gov Thu Feb 18 12:35:20 2021 From: heepark at sandia.gov (Park, Heeho) Date: Thu, 18 Feb 2021 18:35:20 +0000 Subject: [petsc-users] [EXTERNAL] Re: insufficient virtual memory? In-Reply-To: <263912AE-3062-43E3-BBF3-7B3E4703AB0C@petsc.dev> References: <263912AE-3062-43E3-BBF3-7B3E4703AB0C@petsc.dev> Message-ID: Thank you Barry. I will look further into it. - Heeho Daniel Park From: Barry Smith Date: Wednesday, February 17, 2021 at 6:57 PM To: "Park, Heeho" Cc: "petsc-users at mcs.anl.gov" Subject: [EXTERNAL] Re: [petsc-users] insufficient virtual memory? PETSc gets almost all its memory using the C malloc system calls so it is unlikely that this Fortran error message comes from PETSc code. My guess is that you have some Fortran arrays declared somewhere in your code that are large and require memory that is not available. Barry On Feb 17, 2021, at 7:23 PM, Park, Heeho via petsc-users > wrote: Hi PETSc developers, Have you seen this error message? forrtl: severe (41): insufficient virtual memory We are running about 36 million degrees of freedom ( ~ 2.56 GB) and it is failing with the error message on our HPC systems. Ironically, it runs on our laptop (super slow.) type: seqbaij rows=46251272, cols=46251272 total: nonzeros=323046210, allocated nonzeros=323046210 total number of mallocs used during MatSetValues calls=0 block size is 1 Does anyone have experience encountering this problem? Thanks, Heeho Daniel Park ! ------------------------------------ ! Sandia National Laboratories Org: 08844, R&D Work: 505-844-1319 ! ------------------------------------ ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Feb 18 18:15:31 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 18 Feb 2021 18:15:31 -0600 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> <87y2fmbpw2.fsf@jedbrown.org> <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> <50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no> Message-ID: > On Feb 18, 2021, at 6:10 AM, Matthew Knepley wrote: > > On Thu, Feb 18, 2021 at 3:09 AM Roland Richter > wrote: > Hei, > > that was the reason for increased run times. When removing #pragma omp parallel for, my loop took ~18 seconds. When changing it to #pragma omp parallel for num_threads(2) or #pragma omp parallel for num_threads(4) (on a i7-6700), the loop took ~16 s, but when increasing it to #pragma omp parallel for num_threads(8), the loop took 28 s. > > > Editorial: This is a reason I think OpenMP is inappropriate as a tool for parallel computing (many people disagree). It makes resource management > difficult for the user and impossible for a library. It is possible to control these things properly with modern OpenMP APIs but, like MPI implementations, this can require some mucking around a beginner would not know about and the default settings can be terrible. MPI implementations are not better, their default bindings are generally horrible. Barry > > Thanks, > > Matt > Regards, > > Roland > > Am 17.02.21 um 18:51 schrieb Matthew Knepley: >> Jed, is it possible that this is an oversubscription penalty from bad OpenMP settings? >> >> Thanks, >> >> Matt >> >> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter > wrote: >> My PetscScalar is complex double (i.e. even higher penalty), but my matrix has a size of 8kk elements, so that should not an issue. >> Regards, >> Roland >> Von: Jed Brown > >> Gesendet: Mittwoch, 17. Februar 2021 17:49:49 >> An: Roland Richter; PETSc >> Betreff: Re: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results >> >> Roland Richter > writes: >> >> > Hei, >> > >> > I replaced the linking line with >> > >> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -march=native -fopenmp-simd >> > -DMKL_LP64 -m64 >> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o >> > bin/armadillo_with_PETSc >> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib >> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran >> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 >> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl >> > /opt/boost/lib/libboost_filesystem.so.1.72.0 >> > /opt/boost/lib/libboost_mpi.so.1.72.0 >> > /opt/boost/lib/libboost_program_options.so.1.72.0 >> > /opt/boost/lib/libboost_serialization.so.1.72.0 >> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so >> > /opt/petsc_release/lib/libpetsc.so >> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so >> > / >> > >> > and now the results are correct. Nevertheless, when comparing the loop >> > in line 26-28 in file test_scaling.cpp >> > >> > /#pragma omp parallel for// >> > // for(int i = 0; i < r_0 * r_1; ++i)// >> > // *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ >> > >> > the version without /#pragma omp parallel/ for is significantly faster >> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there >> > still such a big difference? >> >> Sounds like you're using a profile to attribute time? Each `omp parallel` region incurs a cost ranging from about a microsecond to 10 or more microseconds depending on architecture, number of threads, and OpenMP implementation. Your loop (for double precision) operates at around 8 entries per clock cycle (depending on architecture) if the operands are in cache so the loop size r_0 * r_1 should be at least 10000 just to pay off the cost of `omp parallel`. >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 18 20:03:21 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 18 Feb 2021 21:03:21 -0500 Subject: [petsc-users] Explicit linking to OpenMP results in performance drop and wrong results In-Reply-To: References: <2f6eaf68-aa54-b766-d4e5-3053225cdb6a@ntnu.no> <874kiad69u.fsf@jedbrown.org> <3fed0724-87b2-26bf-6c79-94c484c23937@ntnu.no> <87y2fmbpw2.fsf@jedbrown.org> <641b1bcbfd2741d58cb8d21960a720ca@ntnu.no> <50b0f197-f515-0f5b-8132-04ea5dbb6814@ntnu.no> Message-ID: On Thu, Feb 18, 2021 at 7:15 PM Barry Smith wrote: > > > On Feb 18, 2021, at 6:10 AM, Matthew Knepley wrote: > > On Thu, Feb 18, 2021 at 3:09 AM Roland Richter > wrote: > >> Hei, >> >> that was the reason for increased run times. When removing #pragma omp >> parallel for, my loop took ~18 seconds. When changing it to #pragma omp >> parallel for num_threads(2) or #pragma omp parallel for num_threads(4) (on >> a i7-6700), the loop took ~16 s, but when increasing it to #pragma omp >> parallel for num_threads(8), the loop took 28 s. >> >> Editorial: This is a reason I think OpenMP is inappropriate as a tool > for parallel computing (many people disagree). It makes resource management > difficult for the user and impossible for a library. > > > It is possible to control these things properly with modern OpenMP APIs > but, like MPI implementations, this can require some mucking around a > beginner would not know about and the default settings can be terrible. MPI > implementations are not better, their default bindings are generally > horrible. > MPI allows the library to understand what resources are available and used. Last time we looked at it, OpenMP does not have such a context object that gets passed into the library (comm). The user could construct one, but then the "usability" of OpenMP fades away. Matt > Barry > > > Thanks, > > Matt > >> Regards, >> >> Roland >> Am 17.02.21 um 18:51 schrieb Matthew Knepley: >> >> Jed, is it possible that this is an oversubscription penalty from bad >> OpenMP settings? > cuneiform> >> >> Thanks, >> >> Matt >> >> On Wed, Feb 17, 2021 at 12:11 PM Roland Richter >> wrote: >> >>> My PetscScalar is complex double (i.e. even higher penalty), but my >>> matrix has a size of 8kk elements, so that should not an issue. >>> Regards, >>> Roland >>> ------------------------------ >>> *Von:* Jed Brown >>> *Gesendet:* Mittwoch, 17. Februar 2021 17:49:49 >>> *An:* Roland Richter; PETSc >>> *Betreff:* Re: [petsc-users] Explicit linking to OpenMP results in >>> performance drop and wrong results >>> >>> Roland Richter writes: >>> >>> > Hei, >>> > >>> > I replaced the linking line with >>> > >>> > //usr/lib64/mpi/gcc/openmpi3/bin/mpicxx -march=native -fopenmp-simd >>> > -DMKL_LP64 -m64 >>> > CMakeFiles/armadillo_with_PETSc.dir/Unity/unity_0_cxx.cxx.o -o >>> > bin/armadillo_with_PETSc >>> > -Wl,-rpath,/opt/boost/lib:/opt/fftw3/lib64:/opt/petsc_release/lib >>> > /usr/lib64/libgsl.so /usr/lib64/libgslcblas.so -lgfortran >>> > -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 >>> > -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl >>> > /opt/boost/lib/libboost_filesystem.so.1.72.0 >>> > /opt/boost/lib/libboost_mpi.so.1.72.0 >>> > /opt/boost/lib/libboost_program_options.so.1.72.0 >>> > /opt/boost/lib/libboost_serialization.so.1.72.0 >>> > /opt/fftw3/lib64/libfftw3.so /opt/fftw3/lib64/libfftw3_mpi.so >>> > /opt/petsc_release/lib/libpetsc.so >>> > /usr/lib64/gcc/x86_64-suse-linux/9/libgomp.so >>> > / >>> > >>> > and now the results are correct. Nevertheless, when comparing the loop >>> > in line 26-28 in file test_scaling.cpp >>> > >>> > /#pragma omp parallel for// >>> > // for(int i = 0; i < r_0 * r_1; ++i)// >>> > // *(out_mat_ptr + i) = (*(in_mat_ptr + i) * scaling_factor);/ >>> > >>> > the version without /#pragma omp parallel/ for is significantly faster >>> > (i.e. 18 s vs 28 s) compared to the version with /omp./ Why is there >>> > still such a big difference? >>> >>> Sounds like you're using a profile to attribute time? Each `omp >>> parallel` region incurs a cost ranging from about a microsecond to 10 or >>> more microseconds depending on architecture, number of threads, and OpenMP >>> implementation. Your loop (for double precision) operates at around 8 >>> entries per clock cycle (depending on architecture) if the operands are in >>> cache so the loop size r_0 * r_1 should be at least 10000 just to pay off >>> the cost of `omp parallel`. >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabien.vergnet at sorbonne-universite.fr Mon Feb 22 05:28:44 2021 From: fabien.vergnet at sorbonne-universite.fr (Fabien Vergnet) Date: Mon, 22 Feb 2021 12:28:44 +0100 Subject: [petsc-users] Get the vertices composing the cells of a DMPlex Submesh Message-ID: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr> Dear PETSc community, Thank you for your amazing work. I discovered PETSc recently and I need your help for a project. As a training, I would like to assemble the Finite Element Matrix for the Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh file and I create a Submesh with DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm); In order to assemble my Finite Element Matrix, I need to iterate over the cells of the mesh (which are triangles) and identify the vertices composing each cell. My question is the following : how can I get, for each cell, the vertices composing the cell ? I have tried to uninterpolate the subdm with DMPlexUninterpolate (with the objective to get the vertices from DMPlexGetCone) but it does not seem to work for a Submesh since I get the following error: ---------- [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Not for partially interpolated meshes [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 [0]PETSC ERROR: Configure options --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS= FFLAGS= CXXFLAGS= --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 --with-metis=1 --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 --with-hypre=1 --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 --with-parmetis=1 --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 --with-libjpeg=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so --with-hdf5=1 --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so --with-zlib=1 [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c [0]PETSC ERROR: #2 assemble_mass() line 59 in /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp ---------- Attached are a minimal working example, a mesh file and a makefile. Any ideas or suggestions are more than welcome ! Regards, Fabien -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mwe.cpp Type: application/applefile Size: 67 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mesh.msh Type: application/applefile Size: 68 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makefile Type: application/applefile Size: 68 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Feb 22 05:48:59 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 22 Feb 2021 14:48:59 +0300 Subject: [petsc-users] Get the vertices composing the cells of a DMPlex Submesh In-Reply-To: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr> References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr> Message-ID: The plex way is to use the transitive closure https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexGetTransitiveClosure.html and filter out points you are not interested in. See for example https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L828 and https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L833 Il giorno lun 22 feb 2021 alle ore 14:28 Fabien Vergnet < fabien.vergnet at sorbonne-universite.fr> ha scritto: > Dear PETSc community, > > Thank you for your amazing work. I discovered PETSc recently and I need > your help for a project. > > As a training, I would like to assemble the Finite Element Matrix for the > Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh > file and I create a Submesh with > > DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm); > > In order to assemble my Finite Element Matrix, I need to iterate over the > cells of the mesh (which are triangles) and identify the vertices > composing each cell. > > My question is the following : how can I get, for each cell, the vertices > composing the cell ? > > I have tried to uninterpolate the subdm with DMPlexUninterpolate (with > the objective to get the vertices from DMPlexGetCone) but it does not > seem to work for a Submesh since I get the following error: > > ---------- > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Not for partially interpolated meshes > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 > [0]PETSC ERROR: Configure options > --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk > --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 > CFLAGS= FFLAGS= CXXFLAGS= > --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc > --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ > --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 > --with-precision=double --with-scalar-type=real --with-shared-libraries=1 > --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= > CXXOPTFLAGS= > --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so > --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 > --with-metis=1 > --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 > --with-hypre=1 > --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 > --with-parmetis=1 > --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc > --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 > --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 > --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 > --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 > --with-libjpeg=0 --with-cxx-dialect=C++11 > --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include > --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a > --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 > --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include > --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so > --with-hdf5=1 > --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include > --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so > --with-zlib=1 > [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in > /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c > [0]PETSC ERROR: #2 assemble_mass() line 59 in > /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp > > ---------- > > Attached are a minimal working example, a mesh file and a makefile. > > Any ideas or suggestions are more than welcome ! > > Regards, > Fabien > > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabien.vergnet at sorbonne-universite.fr Mon Feb 22 06:09:12 2021 From: fabien.vergnet at sorbonne-universite.fr (Fabien Vergnet) Date: Mon, 22 Feb 2021 13:09:12 +0100 Subject: [petsc-users] Get the vertices composing the cells of a DMPlex Submesh In-Reply-To: References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr> Message-ID: <4496AC11-FCF5-4799-9F04-453F695AF04E@sorbonne-universite.fr> Hi Stefano, Thank you for your response. Could you explain more what the output of DMPlexGetTransitiveClosure is ? For example for cell 0 I get the following array of size 14 for the closure: 0 0 370 -2 369 -2 368 -2 309 0 306 0 312 0 but I do not understand the order of the points (center of the cell, middle of the edges, vertices ?). Also what the orientation means for each point ? Regards, Fabien > Le 22 f?vr. 2021 ? 12:48, Stefano Zampini a ?crit : > > The plex way is to use the transitive closure https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexGetTransitiveClosure.html and filter out points you are not interested in. > See for example https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L828 and https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L833 > Il giorno lun 22 feb 2021 alle ore 14:28 Fabien Vergnet > ha scritto: > Dear PETSc community, > > Thank you for your amazing work. I discovered PETSc recently and I need your help for a project. > > As a training, I would like to assemble the Finite Element Matrix for the Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh file and I create a Submesh with > > DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm); > > In order to assemble my Finite Element Matrix, I need to iterate over the cells of the mesh (which are triangles) and identify the vertices composing each cell. > > My question is the following : how can I get, for each cell, the vertices composing the cell ? > > I have tried to uninterpolate the subdm with DMPlexUninterpolate (with the objective to get the vertices from DMPlexGetCone) but it does not seem to work for a Submesh since I get the following error: > > ---------- > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Not for partially interpolated meshes > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 > [0]PETSC ERROR: Configure options --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS= FFLAGS= CXXFLAGS= --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 --with-metis=1 --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 --with-hypre=1 --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 --with-parmetis=1 --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 --with-libjpeg=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so --with-hdf5=1 --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so --with-zlib=1 > [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c > [0]PETSC ERROR: #2 assemble_mass() line 59 in /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp > > ---------- > > Attached are a minimal working example, a mesh file and a makefile. > > Any ideas or suggestions are more than welcome ! > > Regards, > Fabien > > > > > > -- > Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Feb 22 06:23:08 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 22 Feb 2021 15:23:08 +0300 Subject: [petsc-users] Get the vertices composing the cells of a DMPlex Submesh In-Reply-To: <4496AC11-FCF5-4799-9F04-453F695AF04E@sorbonne-universite.fr> References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr> <4496AC11-FCF5-4799-9F04-453F695AF04E@sorbonne-universite.fr> Message-ID: >From your output, it seems a triangle cell. Reading the documentation of DMPlexGetTransitiveClosure I have already pointed you to: the output is interleaved between points (cell, edges and vertices in this case) and their relative orientation wrt the point of higher dimension. Reading your output 0 0 -> cell number 0 (relative orientation is 0) then the edges (all edges are points locally numbered from eStart to eEnd, where DMPlexGetDepthStratum(dm,1,&eStart,&eEnd)) 370 -2 369 -2 368 -2 -> cell 0 is made up by points 370, 369 and 368 (traversed in this specific order), and each edge must be traversed -2 (from second endpoint to first) then the vertices (start and end of local per process numbering via DMPlexGetDepthStratum(dm,0,&vStart,&vEnd)) 309 0 306 0 312 0 -> vertices have no orientation. Il giorno lun 22 feb 2021 alle ore 15:09 Fabien Vergnet < fabien.vergnet at sorbonne-universite.fr> ha scritto: > Hi Stefano, > > Thank you for your response. > > Could you explain more what the output of DMPlexGetTransitiveClosure is ? > For example for cell 0 I get the following array of size 14 for the closure: > > 0 0 370 -2 369 -2 368 -2 309 0 306 0 312 0 > > but I do not understand the order of the points (center of the cell, > middle of the edges, vertices ?). Also what the orientation means for each > point ? > > Regards, > Fabien > > Le 22 f?vr. 2021 ? 12:48, Stefano Zampini a > ?crit : > > The plex way is to use the transitive closure > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexGetTransitiveClosure.html > and filter out points you are not interested in. > See for example > https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L828 > and > https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plex.c#L833 > > Il giorno lun 22 feb 2021 alle ore 14:28 Fabien Vergnet < > fabien.vergnet at sorbonne-universite.fr> ha scritto: > >> Dear PETSc community, >> >> Thank you for your amazing work. I discovered PETSc recently and I need >> your help for a project. >> >> As a training, I would like to assemble the Finite Element Matrix for the >> Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh >> file and I create a Submesh with >> >> DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm); >> >> In order to assemble my Finite Element Matrix, I need to iterate over the >> cells of the mesh (which are triangles) and identify the vertices >> composing each cell. >> >> My question is the following : how can I get, for each cell, the vertices >> composing the cell ? >> >> I have tried to uninterpolate the subdm with DMPlexUninterpolate (with >> the objective to get the vertices from DMPlexGetCone) but it does not >> seem to work for a Submesh since I get the following error: >> >> ---------- >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Invalid argument >> [0]PETSC ERROR: Not for partially interpolated meshes >> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 >> [0]PETSC ERROR: Configure options >> --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk >> --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 >> CFLAGS= FFLAGS= CXXFLAGS= >> --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc >> --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ >> --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 >> --with-precision=double --with-scalar-type=real --with-shared-libraries=1 >> --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= >> CXXOPTFLAGS= >> --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so >> --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 >> --with-metis=1 >> --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 >> --with-hypre=1 >> --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 >> --with-parmetis=1 >> --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc >> --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 >> --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 >> --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 >> --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 >> --with-libjpeg=0 --with-cxx-dialect=C++11 >> --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include >> --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a >> --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 >> --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include >> --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so >> --with-hdf5=1 >> --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include >> --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so >> --with-zlib=1 >> [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in >> /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c >> [0]PETSC ERROR: #2 assemble_mass() line 59 in >> /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp >> >> ---------- >> >> Attached are a minimal working example, a mesh file and a makefile. >> >> Any ideas or suggestions are more than welcome ! >> >> Regards, >> Fabien >> >> >> >> > > -- > Stefano > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 22 07:36:10 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 22 Feb 2021 08:36:10 -0500 Subject: [petsc-users] Get the vertices composing the cells of a DMPlex Submesh In-Reply-To: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr> References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr> Message-ID: On Mon, Feb 22, 2021 at 6:28 AM Fabien Vergnet < fabien.vergnet at sorbonne-universite.fr> wrote: > Dear PETSc community, > > Thank you for your amazing work. I discovered PETSc recently and I need > your help for a project. > > As a training, I would like to assemble the Finite Element Matrix for the > Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh > file and I create a Submesh with > > DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm); > I don't think you want this call. It is designed to pick out hypersurfaces. If you just want part of a mesh, DMPlexFilter is better. > In order to assemble my Finite Element Matrix, I need to iterate over the > cells of the mesh (which are triangles) and identify the vertices > composing each cell. > > My question is the following : how can I get, for each cell, the vertices > composing the cell ? > > I have tried to uninterpolate the subdm with DMPlexUninterpolate (with > the objective to get the vertices from DMPlexGetCone) but it does not > seem to work for a Submesh since I get the following error: > > ---------- > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Not for partially interpolated meshes > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 > [0]PETSC ERROR: Configure options > --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk > --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 > CFLAGS= FFLAGS= CXXFLAGS= > --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc > --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ > --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 > --with-precision=double --with-scalar-type=real --with-shared-libraries=1 > --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= > CXXOPTFLAGS= > --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so > --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 > --with-metis=1 > --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 > --with-hypre=1 > --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 > --with-parmetis=1 > --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc > --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 > --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 > --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 > --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 > --with-libjpeg=0 --with-cxx-dialect=C++11 > --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include > --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a > --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 > --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include > --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so > --with-hdf5=1 > --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include > --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so > --with-zlib=1 > [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in > /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c > [0]PETSC ERROR: #2 assemble_mass() line 59 in > /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp > > ---------- > > Attached are a minimal working example, a mesh file and a makefile. > The code does not seem to be attached. As Stefano says, you can use DMPlexGetTransitiveClosure() to get vertices, but if you actually want values attached to the vertices, it is easier to use DMPlexVecGetClosure(). Thanks, Matt > Any ideas or suggestions are more than welcome ! > > Regards, > Fabien > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabien.vergnet at sorbonne-universite.fr Mon Feb 22 12:06:41 2021 From: fabien.vergnet at sorbonne-universite.fr (Fabien Vergnet) Date: Mon, 22 Feb 2021 19:06:41 +0100 Subject: [petsc-users] Get the vertices composing the cells of a DMPlex Submesh In-Reply-To: References: <10AAF11E-FED2-4B5C-B417-966805DA9C90@sorbonne-universite.fr> Message-ID: Hi Matthew and Stefano, @Stefano, thank you for the explanation. DMPlexGetTransitiveClosure is exactly what I needed. @Matthew, thank you for your advise on using DMPlexFilter, this is much better ! Regards, Fabien > Le 22 f?vr. 2021 ? 14:36, Matthew Knepley a ?crit : > > On Mon, Feb 22, 2021 at 6:28 AM Fabien Vergnet > wrote: > Dear PETSc community, > > Thank you for your amazing work. I discovered PETSc recently and I need your help for a project. > > As a training, I would like to assemble the Finite Element Matrix for the Poisson problem on a part of my mesh. So, I create a DMPlex from a .msh file and I create a Submesh with > > DMPlexCreateSubmesh(dm, label, 1, PETSC_TRUE, &subdm); > > I don't think you want this call. It is designed to pick out hypersurfaces. If you just want part of a mesh, DMPlexFilter is better. > > In order to assemble my Finite Element Matrix, I need to iterate over the cells of the mesh (which are triangles) and identify the vertices composing each cell. > > My question is the following : how can I get, for each cell, the vertices composing the cell ? > > I have tried to uninterpolate the subdm with DMPlexUninterpolate (with the objective to get the vertices from DMPlexGetCone) but it does not seem to work for a Submesh since I get the following error: > > ---------- > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Not for partially interpolated meshes > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.14.4, Feb 03, 2021 > [0]PETSC ERROR: Configure options --prefix=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS= FFLAGS= CXXFLAGS= --with-cc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpicc --with-cxx=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpic++ --with-fc=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openmpi-4.0.5-ygn7zymoy7crl7b4xsdkc4zmfojugmdy/bin/mpif90 --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/openblas-0.3.13-6b3u6zc5j4hvauqh3ldcwnf7lm2o4vyl/lib/libopenblas.so --with-x=0 --with-clanguage=C --with-scalapack=0 --with-cuda=0 --with-metis=1 --with-metis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/metis-5.1.0-lm3k7dh2vslghqtqc6dvcpnc54bfpqq2 --with-hypre=1 --with-hypre-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hypre-2.20.0-4pyxhku65wb5lmh2fpflhmjmow2pbjg7 --with-parmetis=1 --with-parmetis-dir=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/parmetis-4.0.3-pfiam4ccxkyqpapejcdjtlyr6cyz7irc --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-gmp=0 --with-libpng=0 --with-giflib=0 --with-mpfr=0 --with-netcdf=0 --with-pnetcdf=0 --with-moab=0 --with-random123=0 --with-exodusii=0 --with-cgns=0 --with-memkind=0 --with-p4est=0 --with-saws=0 --with-yaml=0 --with-libjpeg=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/include --with-superlu_dist-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/superlu-dist-6.4.0-bagymefhq7s7gerf7jxwiv4fv7szoljh/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-ptscotch=0 --with-hdf5-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/include --with-hdf5-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/hdf5-1.10.7-dkfs4ir3hlahyfs5z4xcmsf4ogklimji/lib/libhdf5.so --with-hdf5=1 --with-zlib-include=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/include --with-zlib-lib=/users/home/vergnet/apps/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.3.0/zlib-1.2.11-2pwsgfxppopolmjj6tf34k5jsaqzpodo/lib/libz.so --with-zlib=1 > [0]PETSC ERROR: #1 DMPlexUninterpolate() line 1514 in /tmp/vergnet/spack-stage/spack-stage-petsc-3.14.4-xtae5tcwlzkb4oiifbayj77bqlz5nngk/spack-src/src/dm/impls/plex/plexinterpolate.c > [0]PETSC ERROR: #2 assemble_mass() line 59 in /users/home/vergnet/codes/cilia/cilia/cpp/assembling.hpp > > ---------- > > Attached are a minimal working example, a mesh file and a makefile. > > The code does not seem to be attached. > > As Stefano says, you can use DMPlexGetTransitiveClosure() to get vertices, but if you actually want values attached to the > vertices, it is easier to use DMPlexVecGetClosure(). > > Thanks, > > Matt > > Any ideas or suggestions are more than welcome ! > > Regards, > Fabien > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lisandro.verga.bega at gmail.com Mon Feb 22 13:59:04 2021 From: lisandro.verga.bega at gmail.com (Lisandro Verga) Date: Mon, 22 Feb 2021 11:59:04 -0800 Subject: [petsc-users] Example finite volume silver in PETSc Message-ID: Dear PETSc Team, I would like to ask you if there a finite volume solver build using the PETSc data structure. I have found several manuscripts or presentations that mention that but I cannot retrieve an example it. Thank you. Regards, -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.guyer at nist.gov Mon Feb 22 14:09:13 2021 From: jonathan.guyer at nist.gov (Guyer, Jonathan E. Dr. (Fed)) Date: Mon, 22 Feb 2021 20:09:13 +0000 Subject: [petsc-users] Example finite volume silver in PETSc In-Reply-To: References: Message-ID: [FiPy](https://www.ctcms.nist.gov/fipy) is a finite volume code that can use PETSc as one possible solver suite. See: https://github.com/usnistgov/fipy/tree/master/fipy/solvers/petsc https://github.com/usnistgov/fipy/blob/master/fipy/matrices/petscMatrix.py Caveat: It?s not intended as an instructional example, if that?s what you?re looking for. On Feb 22, 2021, at 2:59 PM, Lisandro Verga wrote: Dear PETSc Team, I would like to ask you if there a finite volume solver build using the PETSc data structure. I have found several manuscripts or presentations that mention that but I cannot retrieve an example it. Thank you. Regards, -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Mon Feb 22 14:25:35 2021 From: elbueler at alaska.edu (Ed Bueler) Date: Mon, 22 Feb 2021 11:25:35 -0900 Subject: [petsc-users] re Example finite volume silver in PETSc Message-ID: A very basic 2D FV example, a scalar advection solver, using PETSc DMDA, is at https://github.com/bueler/p4pdes/blob/master/c/ch11/advect.c and documented in Chapter 11 of my book ( https://my.siam.org/Store/Product/viewproduct/?ProductId=32850137). This example might be most useful to you if you are interested in implementing flux limiters. Ed > Dear PETSc Team, > > I would like to ask you if there a finite volume solver build using the > PETSc data structure. I have found several manuscripts or presentations > that mention that but I cannot retrieve an example it. > > Thank you. > > Regards, -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Tue Feb 23 01:37:05 2021 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Tue, 23 Feb 2021 08:37:05 +0100 Subject: [petsc-users] DMPlex read partitionned Gmsh Message-ID: Dear all, I was wondering if there was a plan in motion to implement yet another possibility for DMPlexCreateGmshFromFile: read a group of foo_*.msh generated from a partition done directly in Gmsh ? Have a great day, Thibault B.-B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Tue Feb 23 02:51:27 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Tue, 23 Feb 2021 09:51:27 +0100 Subject: [petsc-users] Caught signal number 11 SEGV Message-ID: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> Hi! I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling. Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following: [2]PETSC ERROR: ------------------------------------------------------------------------ [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [2]PETSC ERROR: likely location of problem given in stack below [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [2]PETSC ERROR: INSTEAD the line number of the start of the function [2]PETSC ERROR: is given. [3]PETSC ERROR: ------------------------------------------------------------------------ [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [3]PETSC ERROR: likely location of problem given in stack below [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [3]PETSC ERROR: INSTEAD the line number of the start of the function [3]PETSC ERROR: is given. [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) To prevent termination, change the error handler using PetscPushErrorHandler() [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) To prevent termination, change the error handler using PetscPushErrorHandler() =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 22939 RUNNING AT srvulx13 = EXIT CODE: 134 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? I hope I have been clear enough. Attached you can find also my configure.log and make.log files. Best, Francesco -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2588258 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 12090 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 23 04:49:53 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 Feb 2021 05:49:53 -0500 Subject: [petsc-users] DMPlex read partitionned Gmsh In-Reply-To: References: Message-ID: On Tue, Feb 23, 2021 at 2:37 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Dear all, > > I was wondering if there was a plan in motion to implement yet another > possibility for DMPlexCreateGmshFromFile: read a group of foo_*.msh > generated from a partition done directly in Gmsh ? > What we have implemented now is a system that reads a mesh in parallel from disk into a naive partition, then repartitions and redistributes. We have a paper about this strategy: https://arxiv.org/abs/2004.08729 . Right now it is only implemented in HDF5. This is mainly because: 1) Parallel block reads are easy in HDF5. 2) We use it for checkpointing as well as load, and it is flexible enough for this 3) Label information can be stored in a scalable way It is easy to convert from GMsh to HDF5 (it's a few lines of PETSc). The GMsh format is not ideal for parallelism, and in fact the GMsh reader was also using MED, which is an HDF5 format. We originally wrote an MED reader, but the documentation and support for the library were not up to snuff, so we went with a custom HDF5 format. Is this helpful? Matt > Have a great day, > > Thibault B.-B. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 23 04:54:31 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 Feb 2021 05:54:31 -0500 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> Message-ID: On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda wrote: > Hi! > > I am very new to the PETSc world. I am working with a GitHub repo that > uses PETSc together with Stan (a statistics open source software), here > you > can find the discussion. > It has been defined a functor > to > convert EigenVector to PetscVec and viceversa, both sequentially and in > parallel. > The file > using > these functions does the conversions with the sequential setting. I changed > to those using MPI, that is from EigenVectorToPetscVecSeq > to EigenVectorToPetscVecMPI and so on because I want to evaluate the > scaling. > Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock > optimize in the debug mode I get the error Caught signal number 11 SEGV. > I therefore used the option -start_in_debugger and I get the following: > For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached, you run the program (conr) and then when you hit the SEGV you get a stack trace (where). THanks, Matt > [2]PETSC ERROR: > ------------------------------------------------------------------------ > [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation > Violation, probably memory access out of range > [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [2]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [2]PETSC ERROR: likely location of problem given in stack below > [2]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are > not available, > [2]PETSC ERROR: INSTEAD the line number of the start of the function > [2]PETSC ERROR: is given. > [3]PETSC ERROR: > ------------------------------------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation > Violation, probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are > not available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line > 0 in unknown file (null) > To prevent termination, change the error handler > using PetscPushErrorHandler() > [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line > 0 in unknown file (null) > To prevent termination, change the error handler > using PetscPushErrorHandler() > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 22939 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > I read the documentation regarding the PetscAbortErrorHandler, but I do > not know where should I use it. How can I solve the problem? > I hope I have been clear enough. > Attached you can find also my configure.log and make.log files. > > Best, > Francesco > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From anton.glazkov at chch.ox.ac.uk Tue Feb 23 05:32:27 2021 From: anton.glazkov at chch.ox.ac.uk (Anton Glazkov) Date: Tue, 23 Feb 2021 11:32:27 +0000 Subject: [petsc-users] 64bit indices and revolve compiler warnings Message-ID: Good morning, I have been trying to compile PETSc with 64bit indices and revolve. It compiles ok but gives out warnings of the kind: {PETSCDIR PATH REMOVED}/src/ts/trajectory/impls/memory/trajmemory.c:1479:130: warning: incompatible pointer types passing 'PetscInt *' (aka 'long long *') to parameter of type 'int *' [-Wincompatible-pointer-types] whattodo = revolve_action(&tjsch->rctx->check,&tjsch->rctx->capo,&tjsch->rctx->fine,tjsch->rctx->snaps_in,&tjsch->rctx->info,&tjsch->rctx->where); /* must return 1 or 3 or 4*/ ^~~~~~~~~~~~~~~~~~~ {PETSCDIR PATH REMOVED}/lib/include/revolve_c.h:14:49: note: passing argument to parameter here int revolve_action(int*,int*,int*,int,int*,int*); Is revolve incompatible with 64bit indices by design? Best wishes, Anton PS the compile line is this: ./configure ?prefix={PREFIX REMOVED} --with-cc=cc --with-cxx=CC --with-fc=ftn --with-debugging=0 --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --COPTFLAGS=-g -O3 --CXXOPTFLAGS=-g -O3 --FOPTFLAGS=-g -O3 --with-64-bit-indices --with-scalar-type=complex --download-hypre-shared --download-moab-shared --download-superlu_dist-shared --download-revolve=1 --with-hdf5-dir=/opt/cray/pe/hdf5-parallel/1.12.0.2/CRAYCLANG/9.1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From elena.travaglia at edu.unito.it Tue Feb 23 10:49:01 2021 From: elena.travaglia at edu.unito.it (Elena Travaglia) Date: Tue, 23 Feb 2021 17:49:01 +0100 Subject: [petsc-users] Preconditioner for LSC Message-ID: Dear PETSc users, we would like to compare our preconditioner for the Schur complement of a Stokes system, with the LSC preconditioner already implemented in PETSc. Following the example in the PETSc manual, we've tried -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type ml but this is not working (properly) on our problem. On the other hand we think we have a good preconditioner for A10*A01, so we'd like to try -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type shell but we cannot figure out how to attach our apply() routine to the pc object of fieldsplit_1_lsc. Can this be done in the current interface? Or perhaps, should we call KSPGetOperators on the fieldsplit_1 solver and attach to its Sp operator a "LSC_Lp" of type MATSHELL with our routine attached to the MATOP_SOLVE of the shell matrix? Thanks in advance, Elena and Matteo -- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial? University of Turin?email address?for students and graduates? -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Feb 23 11:19:25 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 23 Feb 2021 11:19:25 -0600 Subject: [petsc-users] headsup: switch git default branch from 'master' to 'main' Message-ID: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov> All, This is a heads-up, we are to switch the default branch in petsc git repo from 'master' to 'main' [Will plan to do the switch on friday the 26th] We've previously switched 'maint' branch to 'release' before 3.14 release - and this change (to 'main') is the next step in this direction. Satish From dalcinl at gmail.com Tue Feb 23 13:52:21 2021 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 23 Feb 2021 22:52:21 +0300 Subject: [petsc-users] headsup: switch git default branch from 'master' to 'main' In-Reply-To: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov> References: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov> Message-ID: May the force be with you, Satish. On Tue, 23 Feb 2021 at 20:19, Satish Balay via petsc-users < petsc-users at mcs.anl.gov> wrote: > All, > > This is a heads-up, we are to switch the default branch in petsc git > repo from 'master' to 'main' > > [Will plan to do the switch on friday the 26th] > > We've previously switched 'maint' branch to 'release' before 3.14 > release - and this change (to 'main') is the next step in this direction. > > Satish > > -- Lisandro Dalcin ============ Senior Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Tue Feb 23 14:55:19 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Tue, 23 Feb 2021 21:55:19 +0100 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> Message-ID: <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> Thank you for the quick response. Sorry, you are right. Here is the complete output: fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13 PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13 xterm: Xt error: Can't open display: :0.0 xterm: DISPLAY is not set xterm: Xt error: Can't open display: :0.0 xterm: DISPLAY is not set method = optimize optimize algorithm = lbfgs (Default) lbfgs method = optimize optimize algorithm = lbfgs (Default) lbfgs init_alpha = 0.001 (Default) tol_obj = 9.9999999999999998e-13 (Default) tol_rel_obj = 10000 (Default) tol_grad = 1e-08 (Default) init_alpha = 0.001 (Default) tol_obj = 9.9999999999999998e-13 (Default) tol_rel_obj = 10000 (Default) tol_grad = 1e-08 (Default) tol_rel_grad = 10000000 (Default) tol_param = 1e-08 (Default) history_size = 5 (Default) tol_rel_grad = 10000000 (Default) tol_param = 1e-08 (Default) history_size = 5 (Default) iter = 2000 (Default) iter = 2000 (Default) save_iterations = 0 (Default) id = 0 (Default) data save_iterations = 0 (Default) id = 0 (Default) data file = (Default) file = (Default) init = 2 (Default) random seed = 3585768430 (Default) init = 2 (Default) random seed = 3585768430 (Default) output file = output.csv (Default) output file = output.csv (Default) diagnostic_file = (Default) refresh = 100 (Default) diagnostic_file = (Default) refresh = 100 (Default) Initial log joint probability = -731.444 Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp To prevent termination, change the error handler using PetscPushErrorHandler() =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 47804 RUNNING AT srvulx13 = EXIT CODE: 134 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions The code inside main.cpp is the following: #include #include #include int main(int argc, char* argv[]) { PetscErrorCode ierr; ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); try { ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); } catch (const std::exception& e) { std::cout << e.what() << std::endl; ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); } ierr = PetscFinalize();CHKERRQ(ierr); return ierr; } I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command. Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ? Best, Francesco > Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley ha scritto: > > On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda wrote: > Hi! > > I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. > It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. > The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling. > Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following: > > For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached, > you run the program (conr) and then when you hit the SEGV you get a stack trace (where). > > THanks, > > Matt > > [2]PETSC ERROR: ------------------------------------------------------------------------ > [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [2]PETSC ERROR: likely location of problem given in stack below > [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [2]PETSC ERROR: INSTEAD the line number of the start of the function > [2]PETSC ERROR: is given. > [3]PETSC ERROR: ------------------------------------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) > To prevent termination, change the error handler using PetscPushErrorHandler() > [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) > To prevent termination, change the error handler using PetscPushErrorHandler() > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 22939 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? > I hope I have been clear enough. > Attached you can find also my configure.log and make.log files. > > Best, > Francesco > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 23 14:59:59 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 Feb 2021 15:59:59 -0500 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> Message-ID: On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda wrote: > Thank you for the quick response. > Sorry, you are right. Here is the complete output: > > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n > 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 > on display :0.0 on machine srvulx13 > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 > on display :0.0 on machine srvulx13 > xterm: Xt error: Can't open display: :0.0 > xterm: DISPLAY is not set > xterm: Xt error: Can't open display: :0.0 > xterm: DISPLAY is not set > Do you have an Xserver running? If not, you can use -start_in_debugger noxterm -debugger_nodes 3 and try to get a stack trace from one node. Thanks, Matt > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > > file = (Default) > init = 2 (Default) > random > seed = 3585768430 (Default) > init = 2 (Default) > random > seed = 3585768430 (Default) > output > file = output.csv (Default) > output > file = output.csv (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > > > Initial log joint probability = -731.444 > Iter log prob ||dx|| ||grad|| alpha > alpha0 # evals Notes > [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 > in src/cmdstan/main.cpp > To prevent termination, change the error handler > using PetscPushErrorHandler() > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 47804 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > > > > > The code inside main.cpp is the following: > > #include > #include > > #include > > int main(int argc, char* argv[]) { > > PetscErrorCode ierr; > ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); > > try { > ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); > } catch (const std::exception& e) { > std::cout << e.what() << std::endl; > ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); > } > > ierr = PetscFinalize();CHKERRQ(ierr); > return ierr; > } > > I highlighted the line 12. Although I read the page where the command > PetscPushErrorHandler is explained and the example provided > (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should > effectively use the command. > Should I change the entire try/catch with PetscPushErrorHandler( > PetscIgnoreErrorHandler,NULL); ? > > Best, > Francesco > > > Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley > ha scritto: > > On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda < > brardafrancesco at gmail.com> wrote: > Hi! > > I am very new to the PETSc world. I am working with a GitHub repo that > uses PETSc together with Stan (a statistics open source software), here you > can find the discussion. > It has been defined a functor to convert EigenVector to PetscVec and > viceversa, both sequentially and in parallel. > The file using these functions does the conversions with the sequential > setting. I changed to those using MPI, that is > from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because > I want to evaluate the scaling. > Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock > optimize in the debug mode I get the error Caught signal number 11 SEGV. I > therefore used the option -start_in_debugger and I get the following: > > For some reason, the -start_in_debuggger option is not being seen. Are you > showing all the output? Once the debugger is attached, > you run the program (conr) and then when you hit the SEGV you get a stack > trace (where). > > THanks, > > Matt > > [2]PETSC ERROR: > ------------------------------------------------------------------------ > [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [2]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [2]PETSC ERROR: likely location of problem given in stack below > [2]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [2]PETSC ERROR: INSTEAD the line number of the start of the function > [2]PETSC ERROR: is given. > [3]PETSC ERROR: > ------------------------------------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in > unknown file (null) > To prevent termination, change the error handler using > PetscPushErrorHandler() > [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in > unknown file (null) > To prevent termination, change the error handler using > PetscPushErrorHandler() > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 22939 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > I read the documentation regarding the PetscAbortErrorHandler, but I do > not know where should I use it. How can I solve the problem? > I hope I have been clear enough. > Attached you can find also my configure.log and make.log files. > > Best, > Francesco > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Tue Feb 23 15:46:49 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Tue, 23 Feb 2021 22:46:49 +0100 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> Message-ID: <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com> Using the command you suggested I got fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3 ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings) method = optimize optimize algorithm = lbfgs (Default) lbfgs method = optimize optimize algorithm = lbfgs (Default) lbfgs init_alpha = 0.001 (Default) tol_obj = 9.9999999999999998e-13 (Default) init_alpha = 0.001 (Default) tol_obj = 9.9999999999999998e-13 (Default) tol_rel_obj = 10000 (Default) tol_grad = 1e-08 (Default) tol_rel_obj = 10000 (Default) tol_grad = 1e-08 (Default) tol_rel_grad = 10000000 (Default) tol_rel_grad = 10000000 (Default) tol_param = 1e-08 (Default) tol_param = 1e-08 (Default) history_size = 5 (Default) iter = 2000 (Default) history_size = 5 (Default) iter = 2000 (Default) save_iterations = 0 (Default) id = 0 (Default) data save_iterations = 0 (Default) id = 0 (Default) data file = (Default) file = (Default) init = 2 (Default) random seed = 3623621468 (Default) output file = output.csv (Default)init = 2 (Default) random seed = 3623621468 (Default) output file = output.csv (Default) diagnostic_file = (Default) refresh = 100 (Default) diagnostic_file = (Default) refresh = 100 (Default) Initial log joint probability = -195.984 Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes 10 -0.97101 0.00292919 1.65855 0.001 0.001 46 LS failed, Hessian reset 12 -0.483952 0.001316 1.18542 0.001 0.001 77 LS failed, Hessian reset 13 -0.477916 0.0118542 0.163518 0.01 0.001 106 LS failed, Hessian reset [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp [1]PETSC ERROR: PETSc Option Table entries: [1]PETSC ERROR: -debugger_nodes 3 [1]PETSC ERROR: -start_in_debugger noxterm [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov????? And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please? I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean. Thank you, Francesco > Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley ha scritto: > > On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda wrote: > Thank you for the quick response. > Sorry, you are right. Here is the complete output: > > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13 > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13 > xterm: Xt error: Can't open display: :0.0 > xterm: DISPLAY is not set > xterm: Xt error: Can't open display: :0.0 > xterm: DISPLAY is not set > > Do you have an Xserver running? If not, you can use > > -start_in_debugger noxterm -debugger_nodes 3 > > and try to get a stack trace from one node. > > Thanks, > > Matt > > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > > file = (Default) > init = 2 (Default) > random > seed = 3585768430 (Default) > init = 2 (Default) > random > seed = 3585768430 (Default) > output > file = output.csv (Default) > output > file = output.csv (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > > > Initial log joint probability = -731.444 > Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes > [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp > To prevent termination, change the error handler using PetscPushErrorHandler() > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 47804 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > > > > > The code inside main.cpp is the following: > > #include > #include > > #include > > int main(int argc, char* argv[]) { > > PetscErrorCode ierr; > ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); > > try { > ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); > } catch (const std::exception& e) { > std::cout << e.what() << std::endl; > ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); > } > > ierr = PetscFinalize();CHKERRQ(ierr); > return ierr; > } > > I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command. > Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ? > > Best, > Francesco > > >> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley ha scritto: >> >> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda wrote: >> Hi! >> >> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. >> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. >> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling. >> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following: >> >> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached, >> you run the program (conr) and then when you hit the SEGV you get a stack trace (where). >> >> THanks, >> >> Matt >> >> [2]PETSC ERROR: ------------------------------------------------------------------------ >> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >> [2]PETSC ERROR: likely location of problem given in stack below >> [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [2]PETSC ERROR: INSTEAD the line number of the start of the function >> [2]PETSC ERROR: is given. >> [3]PETSC ERROR: ------------------------------------------------------------------------ >> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >> [3]PETSC ERROR: likely location of problem given in stack below >> [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [3]PETSC ERROR: INSTEAD the line number of the start of the function >> [3]PETSC ERROR: is given. >> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) >> To prevent termination, change the error handler using PetscPushErrorHandler() >> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) >> To prevent termination, change the error handler using PetscPushErrorHandler() >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 22939 RUNNING AT srvulx13 >> = EXIT CODE: 134 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> =================================================================================== >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? >> I hope I have been clear enough. >> Attached you can find also my configure.log and make.log files. >> >> Best, >> Francesco >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Feb 23 15:49:23 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 23 Feb 2021 15:49:23 -0600 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com> References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com> Message-ID: This run is with '-n 2' - so -debugger_nodes value should be either 0 or 1 Satish On Tue, 23 Feb 2021, Francesco Brarda wrote: > Using the command you suggested I got > > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3 > ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings) > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > > history_size = 5 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data > save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > file = (Default) > init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default)init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > Initial log joint probability = -195.984 > Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes > 10 -0.97101 0.00292919 1.65855 0.001 0.001 46 LS failed, Hessian reset > 12 -0.483952 0.001316 1.18542 0.001 0.001 77 LS failed, Hessian reset > 13 -0.477916 0.0118542 0.163518 0.01 0.001 106 LS failed, Hessian reset > [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -debugger_nodes 3 > [1]PETSC ERROR: -start_in_debugger noxterm > [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov????? > > And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please? > I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean. > > Thank you, > Francesco > > > > Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley ha scritto: > > > > On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda wrote: > > Thank you for the quick response. > > Sorry, you are right. Here is the complete output: > > > > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger > > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13 > > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13 > > xterm: Xt error: Can't open display: :0.0 > > xterm: DISPLAY is not set > > xterm: Xt error: Can't open display: :0.0 > > xterm: DISPLAY is not set > > > > Do you have an Xserver running? If not, you can use > > > > -start_in_debugger noxterm -debugger_nodes 3 > > > > and try to get a stack trace from one node. > > > > Thanks, > > > > Matt > > > > method = optimize > > optimize > > algorithm = lbfgs (Default) > > lbfgs > > method = optimize > > optimize > > algorithm = lbfgs (Default) > > lbfgs > > init_alpha = 0.001 (Default) > > tol_obj = 9.9999999999999998e-13 (Default) > > tol_rel_obj = 10000 (Default) > > tol_grad = 1e-08 (Default) > > init_alpha = 0.001 (Default) > > tol_obj = 9.9999999999999998e-13 (Default) > > tol_rel_obj = 10000 (Default) > > tol_grad = 1e-08 (Default) > > tol_rel_grad = 10000000 (Default) > > tol_param = 1e-08 (Default) > > history_size = 5 (Default) > > tol_rel_grad = 10000000 (Default) > > tol_param = 1e-08 (Default) > > history_size = 5 (Default) > > iter = 2000 (Default) > > iter = 2000 (Default) > > save_iterations = 0 (Default) > > id = 0 (Default) > > data save_iterations = 0 (Default) > > id = 0 (Default) > > data > > file = (Default) > > > > file = (Default) > > init = 2 (Default) > > random > > seed = 3585768430 (Default) > > init = 2 (Default) > > random > > seed = 3585768430 (Default) > > output > > file = output.csv (Default) > > output > > file = output.csv (Default) > > diagnostic_file = (Default) > > refresh = 100 (Default) > > diagnostic_file = (Default) > > refresh = 100 (Default) > > > > > > Initial log joint probability = -731.444 > > Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes > > [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp > > To prevent termination, change the error handler using PetscPushErrorHandler() > > > > =================================================================================== > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > > = PID 47804 RUNNING AT srvulx13 > > = EXIT CODE: 134 > > = CLEANING UP REMAINING PROCESSES > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > > This typically refers to a problem with your application. > > Please see the FAQ page for debugging suggestions > > > > > > > > > > > > The code inside main.cpp is the following: > > > > #include > > #include > > > > #include > > > > int main(int argc, char* argv[]) { > > > > PetscErrorCode ierr; > > ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); > > > > try { > > ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); > > } catch (const std::exception& e) { > > std::cout << e.what() << std::endl; > > ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); > > } > > > > ierr = PetscFinalize();CHKERRQ(ierr); > > return ierr; > > } > > > > I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command. > > Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ? > > > > Best, > > Francesco > > > > > >> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley ha scritto: > >> > >> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda wrote: > >> Hi! > >> > >> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. > >> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. > >> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling. > >> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following: > >> > >> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached, > >> you run the program (conr) and then when you hit the SEGV you get a stack trace (where). > >> > >> THanks, > >> > >> Matt > >> > >> [2]PETSC ERROR: ------------------------------------------------------------------------ > >> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > >> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > >> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > >> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > >> [2]PETSC ERROR: likely location of problem given in stack below > >> [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > >> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > >> [2]PETSC ERROR: INSTEAD the line number of the start of the function > >> [2]PETSC ERROR: is given. > >> [3]PETSC ERROR: ------------------------------------------------------------------------ > >> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > >> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > >> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > >> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > >> [3]PETSC ERROR: likely location of problem given in stack below > >> [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > >> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > >> [3]PETSC ERROR: INSTEAD the line number of the start of the function > >> [3]PETSC ERROR: is given. > >> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) > >> To prevent termination, change the error handler using PetscPushErrorHandler() > >> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) > >> To prevent termination, change the error handler using PetscPushErrorHandler() > >> > >> =================================================================================== > >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > >> = PID 22939 RUNNING AT srvulx13 > >> = EXIT CODE: 134 > >> = CLEANING UP REMAINING PROCESSES > >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > >> =================================================================================== > >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > >> This typically refers to a problem with your application. > >> Please see the FAQ page for debugging suggestions > >> > >> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? > >> I hope I have been clear enough. > >> Attached you can find also my configure.log and make.log files. > >> > >> Best, > >> Francesco > >> > >> > >> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > From junchao.zhang at gmail.com Tue Feb 23 20:31:46 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 23 Feb 2021 20:31:46 -0600 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> Message-ID: Francesco, If you want to debug the code, why don't you build petsc and your code in debug mode so that you can see the full error stack? My favorite debugger is ddt. If not available, then I will try https://github.com/Azrael3000/tmpi. No need to set DISPLAY. But the constraint is you have to use OpenMPI. --Junchao Zhang On Tue, Feb 23, 2021 at 2:55 PM Francesco Brarda wrote: > Thank you for the quick response. > Sorry, you are right. Here is the complete output: > > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n > 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 > on display :0.0 on machine srvulx13 > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 > on display :0.0 on machine srvulx13 > xterm: Xt error: Can't open display: :0.0 > xterm: DISPLAY is not set > xterm: Xt error: Can't open display: :0.0 > xterm: DISPLAY is not set > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > > file = (Default) > init = 2 (Default) > random > seed = 3585768430 (Default) > init = 2 (Default) > random > seed = 3585768430 (Default) > output > file = output.csv (Default) > output > file = output.csv (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > > > Initial log joint probability = -731.444 > Iter log prob ||dx|| ||grad|| alpha > alpha0 # evals Notes > [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 > in src/cmdstan/main.cpp > To prevent termination, change the error handler > using PetscPushErrorHandler() > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 47804 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > > > > > The code inside main.cpp is the following: > > #include > #include > > #include > > int main(int argc, char* argv[]) { > > PetscErrorCode ierr; > ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); > > try { > ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); > } catch (const std::exception& e) { > std::cout << e.what() << std::endl; > ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); > } > > ierr = PetscFinalize();CHKERRQ(ierr); > return ierr; > } > > I highlighted the line 12. Although I read the page where the command > PetscPushErrorHandler is explained and the example provided > (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should > effectively use the command. > Should I change the entire try/catch with PetscPushErrorHandler( > PetscIgnoreErrorHandler,NULL); ? > > Best, > Francesco > > > Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley > ha scritto: > > On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda < > brardafrancesco at gmail.com> wrote: > Hi! > > I am very new to the PETSc world. I am working with a GitHub repo that > uses PETSc together with Stan (a statistics open source software), here you > can find the discussion. > It has been defined a functor to convert EigenVector to PetscVec and > viceversa, both sequentially and in parallel. > The file using these functions does the conversions with the sequential > setting. I changed to those using MPI, that is > from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because > I want to evaluate the scaling. > Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock > optimize in the debug mode I get the error Caught signal number 11 SEGV. I > therefore used the option -start_in_debugger and I get the following: > > For some reason, the -start_in_debuggger option is not being seen. Are you > showing all the output? Once the debugger is attached, > you run the program (conr) and then when you hit the SEGV you get a stack > trace (where). > > THanks, > > Matt > > [2]PETSC ERROR: > ------------------------------------------------------------------------ > [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [2]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [2]PETSC ERROR: likely location of problem given in stack below > [2]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [2]PETSC ERROR: INSTEAD the line number of the start of the function > [2]PETSC ERROR: is given. > [3]PETSC ERROR: > ------------------------------------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in > unknown file (null) > To prevent termination, change the error handler using > PetscPushErrorHandler() > [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in > unknown file (null) > To prevent termination, change the error handler using > PetscPushErrorHandler() > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 22939 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > I read the documentation regarding the PetscAbortErrorHandler, but I do > not know where should I use it. How can I solve the problem? > I hope I have been clear enough. > Attached you can find also my configure.log and make.log files. > > Best, > Francesco > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Feb 23 21:45:29 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 23 Feb 2021 21:45:29 -0600 Subject: [petsc-users] 64bit indices and revolve compiler warnings In-Reply-To: References: Message-ID: SInce the parameter is an integer pointer, we must convert 64-bit indices to int. Cc Mr. Hong, who wrote that line. --Junchao Zhang On Tue, Feb 23, 2021 at 5:36 AM Anton Glazkov wrote: > Good morning, > > > > I have been trying to compile PETSc with 64bit indices and revolve. It > compiles ok but gives out warnings of the kind: > > > > {PETSCDIR PATH REMOVED}/src/ts/trajectory/impls/memory/trajmemory.c:1479:130: > warning: incompatible pointer types passing 'PetscInt *' (aka 'long long > *') to parameter of type 'int *' [-Wincompatible-pointer-types] > > whattodo = > revolve_action(&tjsch->rctx->check,&tjsch->rctx->capo,&tjsch->rctx->fine,tjsch->rctx->snaps_in,&tjsch->rctx->info,&tjsch->rctx->where); > /* must return 1 or 3 or 4*/ > > > ^~~~~~~~~~~~~~~~~~~ > > {PETSCDIR PATH REMOVED}/lib/include/revolve_c.h:14:49: note: passing > argument to parameter here > > int revolve_action(int*,int*,int*,int,int*,int*); > > > > Is revolve incompatible with 64bit indices by design? > > > > Best wishes, > > Anton > > > > PS the compile line is this: > > ./configure ?prefix={PREFIX REMOVED} --with-cc=cc --with-cxx=CC > --with-fc=ftn --with-debugging=0 --with-clib-autodetect=0 > --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --COPTFLAGS=-g > -O3 --CXXOPTFLAGS=-g -O3 --FOPTFLAGS=-g -O3 --with-64-bit-indices > --with-scalar-type=complex --download-hypre-shared --download-moab-shared > --download-superlu_dist-shared --download-revolve=1 > --with-hdf5-dir=/opt/cray/pe/hdf5-parallel/1.12.0.2/CRAYCLANG/9.1 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Feb 23 23:03:54 2021 From: jed at jedbrown.org (Jed Brown) Date: Tue, 23 Feb 2021 22:03:54 -0700 Subject: [petsc-users] Preconditioner for LSC In-Reply-To: References: Message-ID: <87sg5m9hvp.fsf@jedbrown.org> If you've already attached a MatShell, you could presumably use -fieldsplit_1_lsc_pc_type mat to just call its MatMult. The sense I've gotten when I wrote PCLSC and was experimenting with these methods is that the main selling point of LSC (for most discretizations) is that it's more algebraic than the cheaper PCD methods. Elena Travaglia writes: > Dear PETSc users, > > we would like to compare our preconditioner for the Schur complement > of a Stokes system, with the LSC preconditioner already implemented in > PETSc. Following the example in the PETSc manual, we've tried > -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type ml > but this is not working (properly) on our problem. > > On the other hand we think we have a good preconditioner for A10*A01, > so we'd like to try > -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type shell > but we cannot figure out how to attach our apply() routine to > the pc object of fieldsplit_1_lsc. > > Can this be done in the current interface? > Or perhaps, should we call KSPGetOperators on the fieldsplit_1 solver > and attach to its Sp operator a "LSC_Lp" of type MATSHELL with our routine > attached to the MATOP_SOLVE of the shell matrix? > > Thanks in advance, > > Elena and Matteo > > -- > ------------------------ > > > > Indirizzo istituzionale di posta elettronica > degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial? > University of Turin?email address?for students and graduates? From hongzhang at anl.gov Tue Feb 23 23:12:27 2021 From: hongzhang at anl.gov (Zhang, Hong) Date: Wed, 24 Feb 2021 05:12:27 +0000 Subject: [petsc-users] 64bit indices and revolve compiler warnings In-Reply-To: References: Message-ID: <6A7A2FA3-755D-463B-8538-F74EAC828C1A@anl.gov> On Feb 23, 2021, at 5:32 AM, Anton Glazkov > wrote: Good morning, I have been trying to compile PETSc with 64bit indices and revolve. It compiles ok but gives out warnings of the kind: {PETSCDIR PATH REMOVED}/src/ts/trajectory/impls/memory/trajmemory.c:1479:130: warning: incompatible pointer types passing 'PetscInt *' (aka 'long long *') to parameter of type 'int *' [-Wincompatible-pointer-types] whattodo = revolve_action(&tjsch->rctx->check,&tjsch->rctx->capo,&tjsch->rctx->fine,tjsch->rctx->snaps_in,&tjsch->rctx->info,&tjsch->rctx->where); /* must return 1 or 3 or 4*/ ^~~~~~~~~~~~~~~~~~~ {PETSCDIR PATH REMOVED}/lib/include/revolve_c.h:14:49: note: passing argument to parameter here int revolve_action(int*,int*,int*,int,int*,int*); Is revolve incompatible with 64bit indices by design? Yes, Revolve uses int32 only. But we can fix the warnings by downcasting. You can check out this MR: https://gitlab.com/petsc/petsc/-/merge_requests/3654 Thanks, Hong (Mr.) Best wishes, Anton PS the compile line is this: ./configure ?prefix={PREFIX REMOVED} --with-cc=cc --with-cxx=CC --with-fc=ftn --with-debugging=0 --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --COPTFLAGS=-g -O3 --CXXOPTFLAGS=-g -O3 --FOPTFLAGS=-g -O3 --with-64-bit-indices --with-scalar-type=complex --download-hypre-shared --download-moab-shared --download-superlu_dist-shared --download-revolve=1 --with-hdf5-dir=/opt/cray/pe/hdf5-parallel/1.12.0.2/CRAYCLANG/9.1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Feb 24 00:14:04 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 24 Feb 2021 00:14:04 -0600 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com> References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com> Message-ID: <00FCB539-D690-4311-8821-DF81DA408FE7@petsc.dev> start_in_debugger noxterm -debugger_nodes 3 Use -start_in_debugger noxterm -debugger_nodes 0 when not opening windows for each debugger it is best to have the first rank associated with the tty as the debugger node > On Feb 23, 2021, at 3:46 PM, Francesco Brarda wrote: > > Using the command you suggested I got > > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3 > ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings) > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > > history_size = 5 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data > save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > file = (Default) > init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default)init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > Initial log joint probability = -195.984 > Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes > 10 -0.97101 0.00292919 1.65855 0.001 0.001 46 LS failed, Hessian reset > 12 -0.483952 0.001316 1.18542 0.001 0.001 77 LS failed, Hessian reset > 13 -0.477916 0.0118542 0.163518 0.01 0.001 106 LS failed, Hessian reset > [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -debugger_nodes 3 > [1]PETSC ERROR: -start_in_debugger noxterm > [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov ????? > > And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please? > I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean. > > Thank you, > Francesco > > >> Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley > ha scritto: >> >> On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda > wrote: >> Thank you for the quick response. >> Sorry, you are right. Here is the complete output: >> >> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger >> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13 >> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13 >> xterm: Xt error: Can't open display: :0.0 >> xterm: DISPLAY is not set >> xterm: Xt error: Can't open display: :0.0 >> xterm: DISPLAY is not set >> >> Do you have an Xserver running? If not, you can use >> >> -start_in_debugger noxterm -debugger_nodes 3 >> >> and try to get a stack trace from one node. >> >> Thanks, >> >> Matt >> >> method = optimize >> optimize >> algorithm = lbfgs (Default) >> lbfgs >> method = optimize >> optimize >> algorithm = lbfgs (Default) >> lbfgs >> init_alpha = 0.001 (Default) >> tol_obj = 9.9999999999999998e-13 (Default) >> tol_rel_obj = 10000 (Default) >> tol_grad = 1e-08 (Default) >> init_alpha = 0.001 (Default) >> tol_obj = 9.9999999999999998e-13 (Default) >> tol_rel_obj = 10000 (Default) >> tol_grad = 1e-08 (Default) >> tol_rel_grad = 10000000 (Default) >> tol_param = 1e-08 (Default) >> history_size = 5 (Default) >> tol_rel_grad = 10000000 (Default) >> tol_param = 1e-08 (Default) >> history_size = 5 (Default) >> iter = 2000 (Default) >> iter = 2000 (Default) >> save_iterations = 0 (Default) >> id = 0 (Default) >> data save_iterations = 0 (Default) >> id = 0 (Default) >> data >> file = (Default) >> >> file = (Default) >> init = 2 (Default) >> random >> seed = 3585768430 (Default) >> init = 2 (Default) >> random >> seed = 3585768430 (Default) >> output >> file = output.csv (Default) >> output >> file = output.csv (Default) >> diagnostic_file = (Default) >> refresh = 100 (Default) >> diagnostic_file = (Default) >> refresh = 100 (Default) >> >> >> Initial log joint probability = -731.444 >> Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes >> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp >> To prevent termination, change the error handler using PetscPushErrorHandler() >> >> =================================================================================== >> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >> = PID 47804 RUNNING AT srvulx13 >> = EXIT CODE: 134 >> = CLEANING UP REMAINING PROCESSES >> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >> =================================================================================== >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) >> This typically refers to a problem with your application. >> Please see the FAQ page for debugging suggestions >> >> >> >> >> >> The code inside main.cpp is the following: >> >> #include >> #include >> >> #include >> >> int main(int argc, char* argv[]) { >> >> PetscErrorCode ierr; >> ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); >> >> try { >> ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); >> } catch (const std::exception& e) { >> std::cout << e.what() << std::endl; >> ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); >> } >> >> ierr = PetscFinalize();CHKERRQ(ierr); >> return ierr; >> } >> >> I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command. >> Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ? >> >> Best, >> Francesco >> >> >>> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley ha scritto: >>> >>> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda wrote: >>> Hi! >>> >>> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. >>> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. >>> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling. >>> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following: >>> >>> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached, >>> you run the program (conr) and then when you hit the SEGV you get a stack trace (where). >>> >>> THanks, >>> >>> Matt >>> >>> [2]PETSC ERROR: ------------------------------------------------------------------------ >>> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [2]PETSC ERROR: likely location of problem given in stack below >>> [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [2]PETSC ERROR: INSTEAD the line number of the start of the function >>> [2]PETSC ERROR: is given. >>> [3]PETSC ERROR: ------------------------------------------------------------------------ >>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [3]PETSC ERROR: likely location of problem given in stack below >>> [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [3]PETSC ERROR: INSTEAD the line number of the start of the function >>> [3]PETSC ERROR: is given. >>> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) >>> To prevent termination, change the error handler using PetscPushErrorHandler() >>> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) >>> To prevent termination, change the error handler using PetscPushErrorHandler() >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 22939 RUNNING AT srvulx13 >>> = EXIT CODE: 134 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) >>> This typically refers to a problem with your application. >>> Please see the FAQ page for debugging suggestions >>> >>> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? >>> I hope I have been clear enough. >>> Attached you can find also my configure.log and make.log files. >>> >>> Best, >>> Francesco >>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brardafrancesco at gmail.com Wed Feb 24 03:39:43 2021 From: brardafrancesco at gmail.com (Francesco Brarda) Date: Wed, 24 Feb 2021 10:39:43 +0100 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: <00FCB539-D690-4311-8821-DF81DA408FE7@petsc.dev> References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com> <00FCB539-D690-4311-8821-DF81DA408FE7@petsc.dev> Message-ID: I have never used gdb. Using 0 as you suggested I got this output: $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 0 ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings) PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 3903 on srvulx13 GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from examples/rosenbrock/rosenbrock...done. Attaching to program: /home/fbrarda/cmdstan-petsc/examples/rosenbrock/rosenbrock, process 3903 Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. /home/fbrarda/cmdstan-petsc/3903: No such file or directory. (gdb) method = optimize optimize algorithm = lbfgs (Default) lbfgs method = optimize optimize algorithm = lbfgs (Default) lbfgs init_alpha = 0.001 (Default) init_alpha = 0.001 (Default) tol_obj = 9.9999999999999998e-13 (Default) tol_rel_obj = 10000 (Default) tol_grad = 1e-08 (Default) tol_rel_grad = 10000000 (Default) tol_obj = 9.9999999999999998e-13 (Default) tol_rel_obj = 10000 (Default) tol_param = 1e-08 (Default) history_size = 5 (Default) tol_grad = 1e-08 (Default) iter = 2000 (Default) save_iterations = 0 (Default) tol_rel_grad = 10000000 (Default) tol_param = 1e-08 (Default) history_size = 5 (Default) id = 0 (Default) data file = (Default) init = 2 (Default) iter = 2000 (Default) save_iterations = 0 (Default) random seed = 3666155654 (Default) output file = output.csv (Default) diagnostic_file = (Default)id = 0 (Default) data file = (Default) init = 2 (Default) refresh = 100 (Default) random seed = 3666155654 (Default) output file = output.csv (Default) diagnostic_file = (Default) refresh = 100 (Default) Initial log joint probability = -158.559 Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes 12 -0.253535 0.000284499 0.0383658 0.001 0.001 46 LS failed, Hessian reset 13 -0.253535 0.000284499 0.0383658 6.528 0.001 111 LS failed, Hessian reset Optimization terminated with error: Line search failed to achieve a sufficient decrease, no more progress can be made [0]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp To prevent termination, change the error handler using PetscPushErrorHandler() Using only 1 process the code works. Francesco > Il giorno 24 feb 2021, alle ore 07:14, Barry Smith ha scritto: > > > start_in_debugger noxterm -debugger_nodes 3 > > Use -start_in_debugger noxterm -debugger_nodes 0 > > when not opening windows for each debugger it is best to have the first rank associated with the tty as the debugger node > > > > > > >> On Feb 23, 2021, at 3:46 PM, Francesco Brarda wrote: >> >> Using the command you suggested I got >> >> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3 >> ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings) >> method = optimize >> optimize >> algorithm = lbfgs (Default) >> lbfgs >> method = optimize >> optimize >> algorithm = lbfgs (Default) >> lbfgs >> init_alpha = 0.001 (Default) >> tol_obj = 9.9999999999999998e-13 (Default) >> init_alpha = 0.001 (Default) >> tol_obj = 9.9999999999999998e-13 (Default) >> tol_rel_obj = 10000 (Default) >> tol_grad = 1e-08 (Default) tol_rel_obj = 10000 (Default) >> tol_grad = 1e-08 (Default) >> tol_rel_grad = 10000000 (Default) >> >> tol_rel_grad = 10000000 (Default) >> tol_param = 1e-08 (Default) tol_param = 1e-08 (Default) >> history_size = 5 (Default) >> iter = 2000 (Default) >> >> history_size = 5 (Default) >> iter = 2000 (Default) >> save_iterations = 0 (Default) >> id = 0 (Default) >> data >> save_iterations = 0 (Default) >> id = 0 (Default) >> data >> file = (Default) >> file = (Default) >> init = 2 (Default) >> random >> seed = 3623621468 (Default) >> output >> file = output.csv (Default)init = 2 (Default) >> random >> seed = 3623621468 (Default) >> output >> file = output.csv (Default) >> >> diagnostic_file = (Default) >> refresh = 100 (Default) >> >> diagnostic_file = (Default) >> refresh = 100 (Default) >> >> Initial log joint probability = -195.984 >> Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes >> 10 -0.97101 0.00292919 1.65855 0.001 0.001 46 LS failed, Hessian reset >> 12 -0.483952 0.001316 1.18542 0.001 0.001 77 LS failed, Hessian reset >> 13 -0.477916 0.0118542 0.163518 0.01 0.001 106 LS failed, Hessian reset >> [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp >> [1]PETSC ERROR: PETSc Option Table entries: >> [1]PETSC ERROR: -debugger_nodes 3 >> [1]PETSC ERROR: -start_in_debugger noxterm >> [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov????? >> >> And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please? >> I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean. >> >> Thank you, >> Francesco >> >> >>> Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley ha scritto: >>> >>> On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda wrote: >>> Thank you for the quick response. >>> Sorry, you are right. Here is the complete output: >>> >>> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger >>> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13 >>> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13 >>> xterm: Xt error: Can't open display: :0.0 >>> xterm: DISPLAY is not set >>> xterm: Xt error: Can't open display: :0.0 >>> xterm: DISPLAY is not set >>> >>> Do you have an Xserver running? If not, you can use >>> >>> -start_in_debugger noxterm -debugger_nodes 3 >>> >>> and try to get a stack trace from one node. >>> >>> Thanks, >>> >>> Matt >>> >>> method = optimize >>> optimize >>> algorithm = lbfgs (Default) >>> lbfgs >>> method = optimize >>> optimize >>> algorithm = lbfgs (Default) >>> lbfgs >>> init_alpha = 0.001 (Default) >>> tol_obj = 9.9999999999999998e-13 (Default) >>> tol_rel_obj = 10000 (Default) >>> tol_grad = 1e-08 (Default) >>> init_alpha = 0.001 (Default) >>> tol_obj = 9.9999999999999998e-13 (Default) >>> tol_rel_obj = 10000 (Default) >>> tol_grad = 1e-08 (Default) >>> tol_rel_grad = 10000000 (Default) >>> tol_param = 1e-08 (Default) >>> history_size = 5 (Default) >>> tol_rel_grad = 10000000 (Default) >>> tol_param = 1e-08 (Default) >>> history_size = 5 (Default) >>> iter = 2000 (Default) >>> iter = 2000 (Default) >>> save_iterations = 0 (Default) >>> id = 0 (Default) >>> data save_iterations = 0 (Default) >>> id = 0 (Default) >>> data >>> file = (Default) >>> >>> file = (Default) >>> init = 2 (Default) >>> random >>> seed = 3585768430 (Default) >>> init = 2 (Default) >>> random >>> seed = 3585768430 (Default) >>> output >>> file = output.csv (Default) >>> output >>> file = output.csv (Default) >>> diagnostic_file = (Default) >>> refresh = 100 (Default) >>> diagnostic_file = (Default) >>> refresh = 100 (Default) >>> >>> >>> Initial log joint probability = -731.444 >>> Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes >>> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp >>> To prevent termination, change the error handler using PetscPushErrorHandler() >>> >>> =================================================================================== >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>> = PID 47804 RUNNING AT srvulx13 >>> = EXIT CODE: 134 >>> = CLEANING UP REMAINING PROCESSES >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>> =================================================================================== >>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) >>> This typically refers to a problem with your application. >>> Please see the FAQ page for debugging suggestions >>> >>> >>> >>> >>> >>> The code inside main.cpp is the following: >>> >>> #include >>> #include >>> >>> #include >>> >>> int main(int argc, char* argv[]) { >>> >>> PetscErrorCode ierr; >>> ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); >>> >>> try { >>> ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); >>> } catch (const std::exception& e) { >>> std::cout << e.what() << std::endl; >>> ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); >>> } >>> >>> ierr = PetscFinalize();CHKERRQ(ierr); >>> return ierr; >>> } >>> >>> I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command. >>> Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ? >>> >>> Best, >>> Francesco >>> >>> >>>> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley ha scritto: >>>> >>>> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda wrote: >>>> Hi! >>>> >>>> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. >>>> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. >>>> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling. >>>> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following: >>>> >>>> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached, >>>> you run the program (conr) and then when you hit the SEGV you get a stack trace (where). >>>> >>>> THanks, >>>> >>>> Matt >>>> >>>> [2]PETSC ERROR: ------------------------------------------------------------------------ >>>> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>>> [2]PETSC ERROR: likely location of problem given in stack below >>>> [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>>> [2]PETSC ERROR: INSTEAD the line number of the start of the function >>>> [2]PETSC ERROR: is given. >>>> [3]PETSC ERROR: ------------------------------------------------------------------------ >>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>>> [3]PETSC ERROR: likely location of problem given in stack below >>>> [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>>> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>>> [3]PETSC ERROR: INSTEAD the line number of the start of the function >>>> [3]PETSC ERROR: is given. >>>> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) >>>> To prevent termination, change the error handler using PetscPushErrorHandler() >>>> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null) >>>> To prevent termination, change the error handler using PetscPushErrorHandler() >>>> >>>> =================================================================================== >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>> = PID 22939 RUNNING AT srvulx13 >>>> = EXIT CODE: 134 >>>> = CLEANING UP REMAINING PROCESSES >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>> =================================================================================== >>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) >>>> This typically refers to a problem with your application. >>>> Please see the FAQ page for debugging suggestions >>>> >>>> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? >>>> I hope I have been clear enough. >>>> Attached you can find also my configure.log and make.log files. >>>> >>>> Best, >>>> Francesco >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elena.travaglia at edu.unito.it Wed Feb 24 04:51:48 2021 From: elena.travaglia at edu.unito.it (Elena Travaglia) Date: Wed, 24 Feb 2021 11:51:48 +0100 Subject: [petsc-users] Preconditioner for LSC In-Reply-To: <87sg5m9hvp.fsf@jedbrown.org> References: <87sg5m9hvp.fsf@jedbrown.org> Message-ID: Thank you for the replay. Now I have set the precondition on the command line with "-fieldsplit_1_lsc_pc_type" , but is it also possible to set it from within the code instead? What is the equivalent code to obtain the effect of "-fieldsplit_1_lsc_pc_type mat" ? Elena Il giorno mer 24 feb 2021 alle ore 06:04 Jed Brown ha scritto: > If you've already attached a MatShell, you could presumably use > -fieldsplit_1_lsc_pc_type mat to just call its MatMult. > > The sense I've gotten when I wrote PCLSC and was experimenting with these > methods is that the main selling point of LSC (for most discretizations) is > that it's more algebraic than the cheaper PCD methods. > > Elena Travaglia writes: > > > Dear PETSc users, > > > > we would like to compare our preconditioner for the Schur complement > > of a Stokes system, with the LSC preconditioner already implemented in > > PETSc. Following the example in the PETSc manual, we've tried > > -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type ml > > but this is not working (properly) on our problem. > > > > On the other hand we think we have a good preconditioner for A10*A01, > > so we'd like to try > > -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type shell > > but we cannot figure out how to attach our apply() routine to > > the pc object of fieldsplit_1_lsc. > > > > Can this be done in the current interface? > > Or perhaps, should we call KSPGetOperators on the fieldsplit_1 solver > > and attach to its Sp operator a "LSC_Lp" of type MATSHELL with our > routine > > attached to the MATOP_SOLVE of the shell matrix? > > > > Thanks in advance, > > > > Elena and Matteo > > > > -- > > ------------------------ > > > > > > > > Indirizzo istituzionale di posta elettronica > > degli studenti e dei laureati dell'Universit? degli Studi di > TorinoOfficial > > University of Turin email address for students and graduates > -- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial? University of Turin?email address?for students and graduates? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 24 06:20:31 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 Feb 2021 07:20:31 -0500 Subject: [petsc-users] Caught signal number 11 SEGV In-Reply-To: References: <760FB246-04F8-49B7-8FC8-69A7AB880E3C@gmail.com> <9F5B99CE-56BF-4A9A-A866-46878AA9122F@gmail.com> <5F08B032-8D8C-482D-82D2-12BC293A097D@gmail.com> <00FCB539-D690-4311-8821-DF81DA408FE7@petsc.dev> Message-ID: You have shut off the ability to use the debugger on your machine: Attaching to program: /home/fbrarda/cmdstan-petsc/examples/rosenbrock/rosenbrock, process 3903 Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. You need to edit /proc/sys/kernel/yama/ptrace_scope and allow debugging on your box. Thanks, Matt On Wed, Feb 24, 2021 at 4:39 AM Francesco Brarda wrote: > I have never used gdb. > Using 0 as you suggested I got this output: > > $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 > examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm > -debugger_nodes 0 > ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated > as of version 3.14 and will be removed in a future release. Please use the > option -debugger_ranks instead. (Silence this warning > with -options_suppress_deprecated_warnings) > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 3903 > on srvulx13 > GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1 > Copyright (C) 2014 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from examples/rosenbrock/rosenbrock...done. > Attaching to program: > /home/fbrarda/cmdstan-petsc/examples/rosenbrock/rosenbrock, process 3903 > Could not attach to process. If your uid matches the uid of the target > process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try > again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf > ptrace: Operation not permitted. > /home/fbrarda/cmdstan-petsc/3903: No such file or directory. > (gdb) method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > > tol_param = 1e-08 (Default) > history_size = 5 (Default) > tol_grad = 1e-08 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > id = 0 (Default) > data > file = (Default) > init = 2 (Default) > > iter = 2000 (Default) > save_iterations = 0 (Default) > random > seed = 3666155654 (Default) > output > file = output.csv (Default) > diagnostic_file = (Default)id = 0 (Default) > data > file = (Default) > init = 2 (Default) > > refresh = 100 (Default) > > random > seed = 3666155654 (Default) > output > file = output.csv (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > > Initial log joint probability = -158.559 > Iter log prob ||dx|| ||grad|| alpha > alpha0 # evals Notes > 12 -0.253535 0.000284499 0.0383658 0.001 > 0.001 46 LS failed, Hessian reset > 13 -0.253535 0.000284499 0.0383658 6.528 > 0.001 111 LS failed, Hessian reset > Optimization terminated with error: > Line search failed to achieve a sufficient decrease, no more > progress can be made > [0]PETSC ERROR: PetscAbortErrorHandler: main() line 12 > in src/cmdstan/main.cpp > To prevent termination, change the error handler > using PetscPushErrorHandler() > > Using only 1 process the code works. > > Francesco > > Il giorno 24 feb 2021, alle ore 07:14, Barry Smith ha > scritto: > > > start_in_debugger noxterm -debugger_nodes 3 > > Use -start_in_debugger noxterm -debugger_nodes 0 > > when not opening windows for each debugger it is best to have the first > rank associated with the tty as the debugger node > > > > > > > On Feb 23, 2021, at 3:46 PM, Francesco Brarda > wrote: > > Using the command you suggested I got > > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 > examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm > -debugger_nodes 3 > ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated > as of version 3.14 and will be removed in a future release. Please use the > option -debugger_ranks instead. (Silence this warning with > -options_suppress_deprecated_warnings) > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > > history_size = 5 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data > save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > file = (Default) > init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default)init = 2 (Default) > random > seed = 3623621468 (Default) > output > file = output.csv (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > diagnostic_file = (Default) > refresh = 100 (Default) > > Initial log joint probability = -195.984 > Iter log prob ||dx|| ||grad|| alpha alpha0 > # evals Notes > 10 -0.97101 0.00292919 1.65855 0.001 0.001 > 46 LS failed, Hessian reset > 12 -0.483952 0.001316 1.18542 0.001 0.001 > 77 LS failed, Hessian reset > 13 -0.477916 0.0118542 0.163518 0.01 0.001 > 106 LS failed, Hessian reset > [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp > [1]PETSC ERROR: PETSc Option Table entries: > [1]PETSC ERROR: -debugger_nodes 3 > [1]PETSC ERROR: -start_in_debugger noxterm > [1]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov????? > > And then it does not go further. With the -debugger_ranks suggested, the > output is the same. What do you think, please? > I am using a cluster (one node, dual-socket system with twelve-core-CPUs), > but when I do the ssh I do not use the -X flag, if that's what you mean. > > Thank you, > Francesco > > > Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley > ha scritto: > > On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda < > brardafrancesco at gmail.com> wrote: > Thank you for the quick response. > Sorry, you are right. Here is the complete output: > > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 > examples/rosenbrock/rosenbrock optimize -start_in_debugger > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on > display :0.0 on machine srvulx13 > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on > display :0.0 on machine srvulx13 > xterm: Xt error: Can't open display: :0.0 > xterm: DISPLAY is not set > xterm: Xt error: Can't open display: :0.0 > xterm: DISPLAY is not set > > Do you have an Xserver running? If not, you can use > > -start_in_debugger noxterm -debugger_nodes 3 > > and try to get a stack trace from one node. > > Thanks, > > Matt > > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > method = optimize > optimize > algorithm = lbfgs (Default) > lbfgs > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > init_alpha = 0.001 (Default) > tol_obj = 9.9999999999999998e-13 (Default) > tol_rel_obj = 10000 (Default) > tol_grad = 1e-08 (Default) > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > tol_rel_grad = 10000000 (Default) > tol_param = 1e-08 (Default) > history_size = 5 (Default) > iter = 2000 (Default) > iter = 2000 (Default) > save_iterations = 0 (Default) > id = 0 (Default) > data save_iterations = 0 (Default) > id = 0 (Default) > data > file = (Default) > > file = (Default) > init = 2 (Default) > random > seed = 3585768430 (Default) > init = 2 (Default) > random > seed = 3585768430 (Default) > output > file = output.csv (Default) > output > file = output.csv (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > diagnostic_file = (Default) > refresh = 100 (Default) > > > Initial log joint probability = -731.444 > Iter log prob ||dx|| ||grad|| alpha alpha0 > # evals Notes > [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in > src/cmdstan/main.cpp > To prevent termination, change the error handler using > PetscPushErrorHandler() > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 47804 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > > > > > The code inside main.cpp is the following: > > #include > #include > > #include > > int main(int argc, char* argv[]) { > > PetscErrorCode ierr; > ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr); > > try { > ierr = cmdstan::command(argc, argv);CHKERRQ(ierr); > } catch (const std::exception& e) { > std::cout << e.what() << std::endl; > ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr); > } > > ierr = PetscFinalize();CHKERRQ(ierr); > return ierr; > } > > I highlighted the line 12. Although I read the page where the command > PetscPushErrorHandler is explained and the example > provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should > effectively use the command. > Should I change the entire try/catch with > PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ? > > Best, > Francesco > > > Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley > ha scritto: > > On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda < > brardafrancesco at gmail.com> wrote: > Hi! > > I am very new to the PETSc world. I am working with a GitHub repo that > uses PETSc together with Stan (a statistics open source software), here you > can find the discussion. > It has been defined a functor to convert EigenVector to PetscVec and > viceversa, both sequentially and in parallel. > The file using these functions does the conversions with the sequential > setting. I changed to those using MPI, that is > from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because > I want to evaluate the scaling. > Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock > optimize in the debug mode I get the error Caught signal number 11 SEGV. I > therefore used the option -start_in_debugger and I get the following: > > For some reason, the -start_in_debuggger option is not being seen. Are you > showing all the output? Once the debugger is attached, > you run the program (conr) and then when you hit the SEGV you get a stack > trace (where). > > THanks, > > Matt > > [2]PETSC ERROR: > ------------------------------------------------------------------------ > [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [2]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [2]PETSC ERROR: likely location of problem given in stack below > [2]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [2]PETSC ERROR: INSTEAD the line number of the start of the function > [2]PETSC ERROR: is given. > [3]PETSC ERROR: > ------------------------------------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in > unknown file (null) > To prevent termination, change the error handler using > PetscPushErrorHandler() > [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in > unknown file (null) > To prevent termination, change the error handler using > PetscPushErrorHandler() > > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 22939 RUNNING AT srvulx13 > = EXIT CODE: 134 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > > =================================================================================== > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > > I read the documentation regarding the PetscAbortErrorHandler, but I do > not know where should I use it. How can I solve the problem? > I hope I have been clear enough. > Attached you can find also my configure.log and make.log files. > > Best, > Francesco > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Feb 24 12:31:33 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 24 Feb 2021 11:31:33 -0700 Subject: [petsc-users] Preconditioner for LSC In-Reply-To: References: <87sg5m9hvp.fsf@jedbrown.org> Message-ID: <87k0qx9v22.fsf@jedbrown.org> We should add a PCLSCGetKSP() interface to pull out that solver. It'll be somewhat hard to use because the PC needs to be set up before that part would exist. (This is a recurring interface challenge for nested solvers and we don't have a good solution. Options are convenient.) Elena Travaglia writes: > Thank you for the replay. > > Now I have set the precondition on the command line with > "-fieldsplit_1_lsc_pc_type" , but is it also possible to set it from within > the code instead? > What is the equivalent code to obtain the effect of > "-fieldsplit_1_lsc_pc_type mat" ? > > Elena > > Il giorno mer 24 feb 2021 alle ore 06:04 Jed Brown ha > scritto: > >> If you've already attached a MatShell, you could presumably use >> -fieldsplit_1_lsc_pc_type mat to just call its MatMult. >> >> The sense I've gotten when I wrote PCLSC and was experimenting with these >> methods is that the main selling point of LSC (for most discretizations) is >> that it's more algebraic than the cheaper PCD methods. >> >> Elena Travaglia writes: >> >> > Dear PETSc users, >> > >> > we would like to compare our preconditioner for the Schur complement >> > of a Stokes system, with the LSC preconditioner already implemented in >> > PETSc. Following the example in the PETSc manual, we've tried >> > -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type ml >> > but this is not working (properly) on our problem. >> > >> > On the other hand we think we have a good preconditioner for A10*A01, >> > so we'd like to try >> > -fieldsplit_1_pc_type lsc -fieldsplit_1_lsc_pc_type shell >> > but we cannot figure out how to attach our apply() routine to >> > the pc object of fieldsplit_1_lsc. >> > >> > Can this be done in the current interface? >> > Or perhaps, should we call KSPGetOperators on the fieldsplit_1 solver >> > and attach to its Sp operator a "LSC_Lp" of type MATSHELL with our >> routine >> > attached to the MATOP_SOLVE of the shell matrix? >> > >> > Thanks in advance, >> > >> > Elena and Matteo >> > >> > -- >> > ------------------------ >> > >> > >> > >> > Indirizzo istituzionale di posta elettronica >> > degli studenti e dei laureati dell'Universit? degli Studi di >> TorinoOfficial >> > University of Turin email address for students and graduates >> > > -- > ------------------------ > > > > Indirizzo istituzionale di posta elettronica > degli studenti e dei laureati dell'Universit? degli Studi di TorinoOfficial? > University of Turin?email address?for students and graduates? From smithc11 at rpi.edu Thu Feb 25 15:57:47 2021 From: smithc11 at rpi.edu (Cameron Smith) Date: Thu, 25 Feb 2021 16:57:47 -0500 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh In-Reply-To: References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> <62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu> <3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu> <87mu2jg57a.fsf@jedbrown.org> Message-ID: <5e245665-61c6-3a48-9b3e-97b38f69829e@rpi.edu> Hello, Bringing this thread back from the dead... We made progress with creation of a distributed dmplex that matches our source mesh partition and are in need of help writing values into a vector created from the dmplex object. As discussed previously, we have created a DMPlex instance using: DMPlexCreateFromCellListPetsc(...) DMGetPointSF(...) PetscSFSetGraph(...) which gives us a distribution of mesh vertices and elements in the DM object that matches the element-based partition of our unstructured mesh. We then mark mesh vertices on the geometric model boundary using DMSetLabelValue(...) and a map from our mesh vertices to dmplex points (created during dmplex definition of vtx coordinates). Following this, we create a section for vertices: > DMPlexGetDepthStratum(dm, 0, &vStart, &vEnd); > PetscSectionCreate(PetscObjectComm((PetscObject) dm), &s); > PetscSectionSetNumFields(s, 1); > PetscSectionSetFieldComponents(s, 0, 1); > PetscSectionSetChart(s, vStart, vEnd); > for(PetscInt v = vStart; v < vEnd; ++v) { > PetscSectionSetDof(s, v, 1); > PetscSectionSetFieldDof(s, v, 0, 1); > } > PetscSectionSetUp(s); > DMSetLocalSection(dm, s); > PetscSectionDestroy(&s); > DMGetGlobalSection(dm,&s); //update the global section We then try to write values into a local Vec for the on-process vertices (roots and leaves in sf terms) and hit an ordering problem. Specifically, we make the following sequence of calls: DMGetLocalVector(dm,&bloc); VecGetArrayWrite(bloc, &bwrite); //for loop to write values to bwrite VecRestoreArrayWrite(bloc, &bwrite); DMLocalToGlobal(dm,bloc,INSERT_VALUES,b); DMRestoreLocalVector(dm,&bloc); Visualizing Vec 'b' in paraview, and the original mesh, tells us that the dmplex topology and geometry (the vertex coordinates) are correct but that the order we write values is wrong (not total garbage... but clearly shifted). Is there anything obviously wrong in our described approach? I suspect the section creation is wrong and/or we don't understand the order of entries in the array returned by VecGetArrayWrite. Please let us know if other info is needed. We are happy to share the relevant source code. Thank-you, Cameron On 8/25/20 8:34 AM, Cameron Smith wrote: > On 8/24/20 4:57 PM, Matthew Knepley wrote: >> On Mon, Aug 24, 2020 at 4:27 PM Jed Brown > > wrote: >> >> ??? Cameron Smith > writes: >> >> ???? > We made some progress with star forest creation but still have >> ??? work to do. >> ???? > >> ???? > We revisited DMPlexCreateFromCellListParallelPetsc(...) and got it >> ???? > working by sequentially partitioning the vertex coordinates across >> ???? > processes to satisfy the 'vertexCoords' argument. Specifically, >> ??? rank 0 >> ???? > has the coordinates for vertices with global id 0:N/P-1, rank 1 >> has >> ???? > N/P:2*(N/P)-1, and so on (N is the total number of global >> ??? vertices and P >> ???? > is the number of processes). >> ???? > >> ???? > The consequences of the sequential partition of vertex >> ??? coordinates in >> ???? > subsequent solver operations is not clear.? Does it make process i >> ???? > responsible for computations and communications associated with >> ??? global >> ???? > vertices i*(N/P):(i+1)*(N/P)-1 ?? We assumed it does and wanted >> ??? to confirm. >> >> ??? Yeah, in the sense that the corners would be owned by the rank you >> ??? place them on. >> >> ??? But many methods, especially high-order, perform assembly via >> ??? non-overlapping partition of elements, in which case the >> ??? "computations" happen where the elements are (with any required >> ??? vertex data for the closure of those elements being sent to the rank >> ??? handling the element). >> >> ??? Note that a typical pattern would be to create a parallel DMPlex >> ??? with a naive distribution, then repartition/distribute it. >> >> >> As Jed says, CreateParallel() just makes the most naive partition of >> vertices because we have no other information. Once >> the mesh is made, you call DMPlexDistribute() again to reduce the edge >> cut. >> >> ?? Thanks, >> >> ?? ? ?Matt >> > > > Thank you. > > This is being used for PIC code with low order 2d elements whose mesh is > partitioned to minimize communications during particle operations.? This > partition will not be ideal for the field solve using petsc so we're > exploring alternatives that will require minimal data movement between > the two partitions.? Towards that, we'll keep pursuing the SF creation. > > -Cameron > From lisandro.verga.bega at gmail.com Fri Feb 26 02:56:55 2021 From: lisandro.verga.bega at gmail.com (Lisandro Verga) Date: Fri, 26 Feb 2021 00:56:55 -0800 Subject: [petsc-users] re Example finite volume silver in PETSc In-Reply-To: References: Message-ID: Dear All, Thank you. Best Regards, On Monday, February 22, 2021, Ed Bueler wrote: > A very basic 2D FV example, a scalar advection solver, using PETSc DMDA, > is at > https://github.com/bueler/p4pdes/blob/master/c/ch11/advect.c > and documented in Chapter 11 of my book (https://my.siam.org/Store/ > Product/viewproduct/?ProductId=32850137). This example might be most > useful to you if you are interested in implementing flux limiters. > > Ed > > > Dear PETSc Team, > > > > I would like to ask you if there a finite volume solver build using the > > PETSc data structure. I have found several manuscripts or presentations > > that mention that but I cannot retrieve an example it. > > > > Thank you. > > > > Regards, > > -- > Ed Bueler > Dept of Mathematics and Statistics > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 306C Chapman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 26 07:32:37 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 26 Feb 2021 08:32:37 -0500 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh In-Reply-To: <5e245665-61c6-3a48-9b3e-97b38f69829e@rpi.edu> References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> <62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu> <3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu> <87mu2jg57a.fsf@jedbrown.org> <5e245665-61c6-3a48-9b3e-97b38f69829e@rpi.edu> Message-ID: On Thu, Feb 25, 2021 at 4:57 PM Cameron Smith wrote: > Hello, > > Bringing this thread back from the dead... > > We made progress with creation of a distributed dmplex that matches our > source mesh partition and are in need of help writing values into a > vector created from the dmplex object. > > As discussed previously, we have created a DMPlex instance using: > > DMPlexCreateFromCellListPetsc(...) > DMGetPointSF(...) > PetscSFSetGraph(...) > > which gives us a distribution of mesh vertices and elements in the DM > object that matches the element-based partition of our unstructured mesh. > > We then mark mesh vertices on the geometric model boundary using > DMSetLabelValue(...) and a map from our mesh vertices to dmplex points > (created during dmplex definition of vtx coordinates). > > Following this, we create a section for vertices: > > > DMPlexGetDepthStratum(dm, 0, &vStart, &vEnd); > > > > > PetscSectionCreate(PetscObjectComm((PetscObject) dm), &s); > > PetscSectionSetNumFields(s, 1); > > PetscSectionSetFieldComponents(s, 0, 1); > > PetscSectionSetChart(s, vStart, vEnd); > > for(PetscInt v = vStart; v < vEnd; ++v) { > > PetscSectionSetDof(s, v, 1); > > PetscSectionSetFieldDof(s, v, 0, 1); > > } > > PetscSectionSetUp(s); > > DMSetLocalSection(dm, s); > > PetscSectionDestroy(&s); > > DMGetGlobalSection(dm,&s); //update the global section > > We then try to write values into a local Vec for the on-process vertices > (roots and leaves in sf terms) and hit an ordering problem. > Specifically, we make the following sequence of calls: > > DMGetLocalVector(dm,&bloc); > VecGetArrayWrite(bloc, &bwrite); > //for loop to write values to bwrite > VecRestoreArrayWrite(bloc, &bwrite); > DMLocalToGlobal(dm,bloc,INSERT_VALUES,b); > DMRestoreLocalVector(dm,&bloc); > There is an easy way to get diagnostics here. For the local vector DMGetLocalSection(dm, &s); PetscSectionGetOffset(s, v, &off); will give you the offset into the array you got from VecGetArrayWrite() for that vertex. You can get this wrapped up using DMPlexPointLocalWrite() which is what I tend to use for this type of stuff. For the global vector DMGetGlobalSection(dm, &gs); PetscSectionGetOffset(gs, v, &off); will give you the offset into the portion of the global array that is stored in this process. If you do not own the values for this vertex, the number is negative, and it is actually -(i+1) if the index i is the valid one on the owning process. > Visualizing Vec 'b' in paraview, and the > original mesh, tells us that the dmplex topology and geometry (the > vertex coordinates) are correct but that the order we write values is > wrong (not total garbage... but clearly shifted). > We do not make any guarantees that global orders match local orders. However, by default we number up global unknowns in rank order, leaving out the dofs that we not owned. Does this make sense? Thanks, Matt > Is there anything obviously wrong in our described approach? I suspect > the section creation is wrong and/or we don't understand the order of > entries in the array returned by VecGetArrayWrite. > > Please let us know if other info is needed. We are happy to share the > relevant source code. > > Thank-you, > Cameron > > > On 8/25/20 8:34 AM, Cameron Smith wrote: > > On 8/24/20 4:57 PM, Matthew Knepley wrote: > >> On Mon, Aug 24, 2020 at 4:27 PM Jed Brown >> > wrote: > >> > >> Cameron Smith > writes: > >> > >> > We made some progress with star forest creation but still have > >> work to do. > >> > > >> > We revisited DMPlexCreateFromCellListParallelPetsc(...) and got > it > >> > working by sequentially partitioning the vertex coordinates > across > >> > processes to satisfy the 'vertexCoords' argument. Specifically, > >> rank 0 > >> > has the coordinates for vertices with global id 0:N/P-1, rank 1 > >> has > >> > N/P:2*(N/P)-1, and so on (N is the total number of global > >> vertices and P > >> > is the number of processes). > >> > > >> > The consequences of the sequential partition of vertex > >> coordinates in > >> > subsequent solver operations is not clear. Does it make process > i > >> > responsible for computations and communications associated with > >> global > >> > vertices i*(N/P):(i+1)*(N/P)-1 ? We assumed it does and wanted > >> to confirm. > >> > >> Yeah, in the sense that the corners would be owned by the rank you > >> place them on. > >> > >> But many methods, especially high-order, perform assembly via > >> non-overlapping partition of elements, in which case the > >> "computations" happen where the elements are (with any required > >> vertex data for the closure of those elements being sent to the rank > >> handling the element). > >> > >> Note that a typical pattern would be to create a parallel DMPlex > >> with a naive distribution, then repartition/distribute it. > >> > >> > >> As Jed says, CreateParallel() just makes the most naive partition of > >> vertices because we have no other information. Once > >> the mesh is made, you call DMPlexDistribute() again to reduce the edge > >> cut. > >> > >> Thanks, > >> > >> Matt > >> > > > > > > Thank you. > > > > This is being used for PIC code with low order 2d elements whose mesh is > > partitioned to minimize communications during particle operations. This > > partition will not be ideal for the field solve using petsc so we're > > exploring alternatives that will require minimal data movement between > > the two partitions. Towards that, we'll keep pursuing the SF creation. > > > > -Cameron > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Fri Feb 26 11:01:56 2021 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Fri, 26 Feb 2021 18:01:56 +0100 Subject: [petsc-users] [petsc-dev] headsup: switch git default branch from 'master' to 'main' In-Reply-To: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov> References: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov> Message-ID: The answers to these were probably stated already, but the reminder might be useful to others, as well. What will happen to "master" after today? Will it be deleted immediately or at some planned time? If not immediately deleted, will it be updated to match main? > Am 23.02.2021 um 18:19 schrieb Satish Balay via petsc-dev : > > All, > > This is a heads-up, we are to switch the default branch in petsc git > repo from 'master' to 'main' > > [Will plan to do the switch on friday the 26th] > > We've previously switched 'maint' branch to 'release' before 3.14 > release - and this change (to 'main') is the next step in this direction. > > Satish > From balay at mcs.anl.gov Fri Feb 26 11:06:39 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 26 Feb 2021 11:06:39 -0600 Subject: [petsc-users] [petsc-dev] headsup: switch git default branch from 'master' to 'main' In-Reply-To: References: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov> Message-ID: <5e90c038-76e8-bd34-ad93-1a2bf8b2a4a@mcs.anl.gov> I plan to delete 'master' immediately - so that folk don't assume it still exits and work with it [assuming its the latest, creating MRs against it etc..] Satish On Fri, 26 Feb 2021, Patrick Sanan wrote: > The answers to these were probably stated already, but the reminder might be useful to others, as well. > > What will happen to "master" after today? Will it be deleted immediately or at some planned time? If not immediately deleted, will it be updated to match main? > > > Am 23.02.2021 um 18:19 schrieb Satish Balay via petsc-dev : > > > > All, > > > > This is a heads-up, we are to switch the default branch in petsc git > > repo from 'master' to 'main' > > > > [Will plan to do the switch on friday the 26th] > > > > We've previously switched 'maint' branch to 'release' before 3.14 > > release - and this change (to 'main') is the next step in this direction. > > > > Satish > > > From balay at mcs.anl.gov Fri Feb 26 15:50:40 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 26 Feb 2021 15:50:40 -0600 Subject: [petsc-users] [petsc-dev] headsup: switch git default branch from 'master' to 'main' In-Reply-To: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov> References: <55996c7c-a274-5ebb-bba7-e06ac4c3b83a@mcs.anl.gov> Message-ID: <4834e30-b876-d6b5-9b8a-1a2396efef7@mcs.anl.gov> Update: the switch (at gitlab.com/petsc/petsc) is done. Please delete your local copy of 'master' branch and start using 'main' branch. Satish On Tue, 23 Feb 2021, Satish Balay via petsc-dev wrote: > All, > > This is a heads-up, we are to switch the default branch in petsc git > repo from 'master' to 'main' > > [Will plan to do the switch on friday the 26th] > > We've previously switched 'maint' branch to 'release' before 3.14 > release - and this change (to 'main') is the next step in this direction. > > Satish >