From numbersixvs at gmail.com Wed Sep 1 03:42:37 2021 From: numbersixvs at gmail.com (=?UTF-8?B?0J3QsNC30LTRgNCw0YfRkdCyINCS0LjQutGC0L7RgA==?=) Date: Wed, 1 Sep 2021 11:42:37 +0300 Subject: [petsc-users] Slow convergence while parallel computations. Message-ID: Dear all, I have a 3D elasticity problem with heterogeneous properties. There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)? Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it? What advice would you give to improve the convergence rate with multiple MPI processes, but keep memory consumption reasonable? Kind regards, Viktor Nazdrachev R&D senior researcher Geosteering Technologies LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Wed Sep 1 04:01:26 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Wed, 1 Sep 2021 11:01:26 +0200 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: <7EFBB20A-CB8A-47BA-BDD8-4E0BD43BBC31@joliv.et> Dear Viktor, > On 1 Sep 2021, at 10:42 AM, ????????? ?????? wrote: > > Dear all, > > I have a 3D elasticity problem with heterogeneous properties. There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). > > The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver > Block Krylov solvers are (most often) only useful if you have multiple right-hand sides, e.g., in the context of elasticity, multiple loadings. Is that really the case? If not, you may as well stick to ?standard? CG instead of the breakdown-free block (BFB) variant. > and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. > > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. > I?m surprised that GAMG is converging so slowly. What do you mean by "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse level solver? How many iterations are required to reach convergence? Could you please maybe run the solver with -ksp_view -log_view and send us the output? Most of the default parameters of GAMG should be good enough for 3D elasticity, provided that your MatNullSpace is correct. One parameter that may need some adjustments though is the aggregation threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] range, that?s what I always use for elasticity problems). Thanks, Pierre > Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)? > > > Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it? > > > What advice would you give to improve the convergence rate with multiple MPI processes, but keep memory consumption reasonable? > > > Kind regards, > > Viktor Nazdrachev > > R&D senior researcher > > Geosteering Technologies LLC > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Wed Sep 1 04:02:42 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Wed, 1 Sep 2021 10:02:42 +0100 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: > On 1 Sep 2021, at 09:42, ????????? ?????? wrote: > > I have a 3D elasticity problem with heterogeneous properties. What does your coefficient variation look like? How large is the contrast? > There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). > > The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. How many iterations do you have in serial (and then in parallel)? > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. Does the number of iterates increase in parallel? Again, how many iterations do you have? > Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)? bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM. > Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it? I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. Lawrence From mfadams at lbl.gov Wed Sep 1 06:49:40 2021 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 1 Sep 2021 07:49:40 -0400 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: as far as GAMG: * Pierre is right, start with the defaults. AMG does take tuning. 2D and 3D are very different, among other things. You can run with '-info :pc', which is very noisy and grep on "GAMG" and send me the result. (Oh Lawrence recommend this, just send it) - ICC is not good because it has to scale the diagonal to avoid negative pivots (even for SPD matrices that are not M matrices at least). This is probably a problem. - As Lawrence indicates, jumps in coefficients can be hard for generic AMG. - And yes, -pc_gamg_threshold is an important parameter for homogeneous problems and can be additionally important for inhomogeneous problems to get the AMG method to "see" your jumps. * The memory problems are from squaring the graph, among other things, which you usually need to do for elasticity unless you have high order elements, maybe. * You can try PCBDDC, DD methods are nice for elasticity. * You can try hypre. Good solver but 3D elasticity is not its strength. * As far as poor scaling, you have large subdomains, I assume the load balancing is decent, and the network is not crazy. This might be a lot of setup cost. Run with -log_view and look at the KSPSolve and MatPtAP... - The solver will call the setup (MatPtAP), if it has not been done yet, so that it gets folded in. You can call KSPSetup() before KSPSolve() to get the timings separated. I you are using the solver (eg, not full Newton) then the setup gets amortized. Mark On Wed, Sep 1, 2021 at 5:02 AM Lawrence Mitchell wrote: > > > > On 1 Sep 2021, at 09:42, ????????? ?????? wrote: > > > > I have a 3D elasticity problem with heterogeneous properties. > > What does your coefficient variation look like? How large is the contrast? > > > There is unstructured grid with aspect ratio varied from 4 to 25. Zero > Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) > BCs are imposed on side faces. Gravity load is also accounted for. The grid > I use consists of 500k cells (which is approximately 1.6M of DOFs). > > > > The best performance and memory usage for single MPI process was > obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as > preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with > 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of > number of iterations required to achieve the same tolerance is > significantly increased. > > How many iterations do you have in serial (and then in parallel)? > > > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) > sub-precondtioner. For single MPI process, the calculation took 10 min and > 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached > using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. > This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. > Also, there is peak memory usage with 14.1 GB, which appears just before > the start of the iterations. Parallel computation with 4 MPI processes took > 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is > about 22 GB. > > Does the number of iterates increase in parallel? Again, how many > iterations do you have? > > > Are there ways to avoid decreasing of the convergence rate for bjacobi > precondtioner in parallel mode? Does it make sense to use hierarchical or > nested krylov methods with a local gmres solver (sub_pc_type gmres) and > some sub-precondtioner (for example, sub_pc_type bjacobi)? > > bjacobi is only a one-level method, so you would not expect > process-independent convergence rate for this kind of problem. If the > coefficient variation is not too extreme, then I would expect GAMG (or some > other smoothed aggregation package, perhaps -pc_type ml (you need > --download-ml)) would work well with some tuning. > > If you have extremely high contrast coefficients you might need something > with stronger coarse grids. If you can assemble so-called Neumann matrices ( > https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you > could try the geneo scheme offered by PCHPDDM. > > > Is this peak memory usage expected for gamg preconditioner? is there any > way to reduce it? > > I think that peak memory usage comes from building the coarse grids. Can > you run with `-info` and grep for GAMG, this will provide some output that > more expert GAMG users can interpret. > > Lawrence > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Wed Sep 1 13:49:27 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 1 Sep 2021 11:49:27 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: fc should not be required since I link PETSc with pre-compiled MUMPS. In fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should not be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my pre-compiled MUMPS. I am able to make it work using PETSc 3.11.3. Attached please find the cPETSc 3.11.3 onfigure.log PETSc. On Tue, Aug 31, 2021 at 4:47 PM Satish Balay wrote: > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > details): > > ------------------------------------------------------------------------------- > Package mumps requested requires Fortran but compiler turned off. > > ******************************************************************************* > > i.e remove '--with-fc=0' and rerun configure. > > Satish > > On Tue, 31 Aug 2021, Sam Guo wrote: > > > Attached please find the latest configure.log. > > > > grep MUMPS_VERSION > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > MUMPS_VERSION > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > MUMPS_VERSION "5.2.1" > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > MUMPS_VERSION_MAX_LEN > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > MUMPS_VERSION_MAX_LEN 30 > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > MUMPS_VERSION > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > MUMPS_VERSION "5.2.1" > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > MUMPS_VERSION_MAX_LEN > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > MUMPS_VERSION_MAX_LEN 30 > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > MUMPS_VERSION > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > MUMPS_VERSION "5.2.1" > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > MUMPS_VERSION_MAX_LEN > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > MUMPS_VERSION_MAX_LEN 30 > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > MUMPS_VERSION > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > MUMPS_VERSION "5.2.1" > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > MUMPS_VERSION_MAX_LEN > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > MUMPS_VERSION_MAX_LEN 30 > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay wrote: > > > > > Also - what do you have for: > > > > > > grep MUMPS_VERSION > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > > > > Satish > > > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > > > > > > > please resend the logs > > > > > > > > Satish > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > Same compiling error with --with-mumps-serial=1. > > > > > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay > > > wrote: > > > > > > > > > > > Use the additional option: -with-mumps-serial > > > > > > > > > > > > Satish > > > > > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > > > > Attached please find the configure.log. I use my own CMake. I > have > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo > > > > wrote: > > > > > > > > > > > > > > > I use pre-installed > > > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > balay at mcs.anl.gov> > > > > > > wrote: > > > > > > > > > > > > > > > >> > > > > > > > >> Are you using --download-mumps or pre-installed mumps? If > using > > > > > > > >> pre-installed - try --download-mumps. > > > > > > > >> > > > > > > > >> If you still have issues - send us configure.log and > make.log > > > from the > > > > > > > >> failed build. > > > > > > > >> > > > > > > > >> Satish > > > > > > > >> > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > >> > > > > > > > >> > Dear PETSc dev team, > > > > > > > >> > I am compiling petsc 3.15.3 and got following compiling > > > error > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: > > > missing > > > > > > binary > > > > > > > >> > operator before token "(" > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > > > > > > >> > Any idea what I did wrong? > > > > > > > >> > > > > > > > > >> > Thanks, > > > > > > > >> > Sam > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 1074595 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Sep 1 14:00:24 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Sep 2021 14:00:24 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: mumps is a fortran package - so best to specify fc. Any specific reason for needing to force '--with-fc=0'? The attached configure.log is not using mumps. Satish On Wed, 1 Sep 2021, Sam Guo wrote: > fc should not be required since I link PETSc with pre-compiled MUMPS. In > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should not > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my > pre-compiled MUMPS. > > I am able to make it work using PETSc 3.11.3. Attached please find the > cPETSc 3.11.3 onfigure.log PETSc. > > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay wrote: > > > > > ******************************************************************************* > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for > > details): > > > > ------------------------------------------------------------------------------- > > Package mumps requested requires Fortran but compiler turned off. > > > > ******************************************************************************* > > > > i.e remove '--with-fc=0' and rerun configure. > > > > Satish > > > > On Tue, 31 Aug 2021, Sam Guo wrote: > > > > > Attached please find the latest configure.log. > > > > > > grep MUMPS_VERSION > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > > MUMPS_VERSION > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > > MUMPS_VERSION "5.2.1" > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > > MUMPS_VERSION_MAX_LEN > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > > MUMPS_VERSION_MAX_LEN 30 > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > > MUMPS_VERSION > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > > MUMPS_VERSION "5.2.1" > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > > MUMPS_VERSION_MAX_LEN > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > > MUMPS_VERSION_MAX_LEN 30 > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > > MUMPS_VERSION > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > > MUMPS_VERSION "5.2.1" > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > > MUMPS_VERSION_MAX_LEN > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > > MUMPS_VERSION_MAX_LEN 30 > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > > MUMPS_VERSION > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > > MUMPS_VERSION "5.2.1" > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > > MUMPS_VERSION_MAX_LEN > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > > MUMPS_VERSION_MAX_LEN 30 > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay wrote: > > > > > > > Also - what do you have for: > > > > > > > > grep MUMPS_VERSION > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > > > > > > Satish > > > > > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > > > > > > > > > please resend the logs > > > > > > > > > > Satish > > > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > > Same compiling error with --with-mumps-serial=1. > > > > > > > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay > > > > wrote: > > > > > > > > > > > > > Use the additional option: -with-mumps-serial > > > > > > > > > > > > > > Satish > > > > > > > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > > > > > > Attached please find the configure.log. I use my own CMake. I > > have > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo > > > > > > wrote: > > > > > > > > > > > > > > > > > I use pre-installed > > > > > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > > balay at mcs.anl.gov> > > > > > > > wrote: > > > > > > > > > > > > > > > > > >> > > > > > > > > >> Are you using --download-mumps or pre-installed mumps? If > > using > > > > > > > > >> pre-installed - try --download-mumps. > > > > > > > > >> > > > > > > > > >> If you still have issues - send us configure.log and > > make.log > > > > from the > > > > > > > > >> failed build. > > > > > > > > >> > > > > > > > > >> Satish > > > > > > > > >> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > >> > > > > > > > > >> > Dear PETSc dev team, > > > > > > > > >> > I am compiling petsc 3.15.3 and got following compiling > > > > error > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: > > > > missing > > > > > > > binary > > > > > > > > >> > operator before token "(" > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > > > > > > > >> > Any idea what I did wrong? > > > > > > > > >> > > > > > > > > > >> > Thanks, > > > > > > > > >> > Sam > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From sam.guo at cd-adapco.com Wed Sep 1 14:12:57 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 1 Sep 2021 12:12:57 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: I believe I am using MUMPS since I have done following (1) defined -DPETSC_HAVE_MUMPS, (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c (3) link my pre-compiled MUMPS, and (4) specifies following PETSc options checkError(EPSGetST(eps, &st)); checkError(STSetType(st, STSINVERT)); //if(useShellMatrix) checkError(STSetMatMode(st, ST_MATMODE_SHELL)); checkError(STGetKSP(st, &ksp)); checkError(KSPSetOperators(ksp, A, A)); checkError(KSPSetType(ksp, KSPPREONLY)); checkError(KSPGetPC(ksp, &pc)); checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); checkError(PCSetType(pc, PCCHOLESKY)); checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); checkError(PCFactorSetUpMatSolverType(pc)); checkError(PetscOptionsSetValue(NULL, "-mat_mumps_icntl_13","1")); Another evidence I am using MUMPS is that If I skip (1)-(3) above, I got the PETSc error saying that MUMPS is required. On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: > mumps is a fortran package - so best to specify fc. Any specific reason > for needing to force '--with-fc=0'? > > The attached configure.log is not using mumps. > > Satish > > On Wed, 1 Sep 2021, Sam Guo wrote: > > > fc should not be required since I link PETSc with pre-compiled MUMPS. In > > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should > not > > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my > > pre-compiled MUMPS. > > > > I am able to make it work using PETSc 3.11.3. Attached please find the > > cPETSc 3.11.3 onfigure.log PETSc. > > > > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay wrote: > > > > > > > > > ******************************************************************************* > > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > for > > > details): > > > > > > > ------------------------------------------------------------------------------- > > > Package mumps requested requires Fortran but compiler turned off. > > > > > > > ******************************************************************************* > > > > > > i.e remove '--with-fc=0' and rerun configure. > > > > > > Satish > > > > > > On Tue, 31 Aug 2021, Sam Guo wrote: > > > > > > > Attached please find the latest configure.log. > > > > > > > > grep MUMPS_VERSION > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > > > MUMPS_VERSION > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > > > MUMPS_VERSION "5.2.1" > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > > > MUMPS_VERSION_MAX_LEN > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > > > MUMPS_VERSION_MAX_LEN 30 > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > > > MUMPS_VERSION > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > > > MUMPS_VERSION "5.2.1" > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > > > MUMPS_VERSION_MAX_LEN > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > > > MUMPS_VERSION_MAX_LEN 30 > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > > > MUMPS_VERSION > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > > > MUMPS_VERSION "5.2.1" > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > > > MUMPS_VERSION_MAX_LEN > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > > > MUMPS_VERSION_MAX_LEN 30 > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > > > MUMPS_VERSION > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > > > MUMPS_VERSION "5.2.1" > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > > > MUMPS_VERSION_MAX_LEN > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > > > MUMPS_VERSION_MAX_LEN 30 > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay > wrote: > > > > > > > > > Also - what do you have for: > > > > > > > > > > grep MUMPS_VERSION > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > > > > > > > > Satish > > > > > > > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > > > > > > > > > > > please resend the logs > > > > > > > > > > > > Satish > > > > > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > > > > Same compiling error with --with-mumps-serial=1. > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < > balay at mcs.anl.gov> > > > > > wrote: > > > > > > > > > > > > > > > Use the additional option: -with-mumps-serial > > > > > > > > > > > > > > > > Satish > > > > > > > > > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > > > > > > > > Attached please find the configure.log. I use my own > CMake. I > > > have > > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < > sam.guo at cd-adapco.com > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > I use pre-installed > > > > > > > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > > > balay at mcs.anl.gov> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > >> Are you using --download-mumps or pre-installed mumps? > If > > > using > > > > > > > > > >> pre-installed - try --download-mumps. > > > > > > > > > >> > > > > > > > > > >> If you still have issues - send us configure.log and > > > make.log > > > > > from the > > > > > > > > > >> failed build. > > > > > > > > > >> > > > > > > > > > >> Satish > > > > > > > > > >> > > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > >> > > > > > > > > > >> > Dear PETSc dev team, > > > > > > > > > >> > I am compiling petsc 3.15.3 and got following > compiling > > > > > error > > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: > error: > > > > > missing > > > > > > > > binary > > > > > > > > > >> > operator before token "(" > > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > > > > > > > > >> > Any idea what I did wrong? > > > > > > > > > >> > > > > > > > > > > >> > Thanks, > > > > > > > > > >> > Sam > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Sep 1 14:19:48 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Sep 2021 14:19:48 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: I'm not sure why you would want to do this - instead of following the recommended installation instructions. If your process works - thats great! you can use it! But why start this e-mail thread? Satish On Wed, 1 Sep 2021, Sam Guo wrote: > I believe I am using MUMPS since I have done following > (1) defined -DPETSC_HAVE_MUMPS, > (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > (3) link my pre-compiled MUMPS, and > (4) specifies following PETSc options > checkError(EPSGetST(eps, &st)); > checkError(STSetType(st, STSINVERT)); > //if(useShellMatrix) checkError(STSetMatMode(st, ST_MATMODE_SHELL)); > checkError(STGetKSP(st, &ksp)); > checkError(KSPSetOperators(ksp, A, A)); > checkError(KSPSetType(ksp, KSPPREONLY)); > checkError(KSPGetPC(ksp, &pc)); > checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); > checkError(PCSetType(pc, PCCHOLESKY)); > checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); > checkError(PCFactorSetUpMatSolverType(pc)); > checkError(PetscOptionsSetValue(NULL, "-mat_mumps_icntl_13","1")); > > Another evidence I am using MUMPS is that If I skip (1)-(3) above, I got > the PETSc error saying that MUMPS is required. > > On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: > > > mumps is a fortran package - so best to specify fc. Any specific reason > > for needing to force '--with-fc=0'? > > > > The attached configure.log is not using mumps. > > > > Satish > > > > On Wed, 1 Sep 2021, Sam Guo wrote: > > > > > fc should not be required since I link PETSc with pre-compiled MUMPS. In > > > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should > > not > > > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my > > > pre-compiled MUMPS. > > > > > > I am able to make it work using PETSc 3.11.3. Attached please find the > > > cPETSc 3.11.3 onfigure.log PETSc. > > > > > > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay wrote: > > > > > > > > > > > > > ******************************************************************************* > > > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > > for > > > > details): > > > > > > > > > > ------------------------------------------------------------------------------- > > > > Package mumps requested requires Fortran but compiler turned off. > > > > > > > > > > ******************************************************************************* > > > > > > > > i.e remove '--with-fc=0' and rerun configure. > > > > > > > > Satish > > > > > > > > On Tue, 31 Aug 2021, Sam Guo wrote: > > > > > > > > > Attached please find the latest configure.log. > > > > > > > > > > grep MUMPS_VERSION > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > > > > MUMPS_VERSION > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > > > > MUMPS_VERSION "5.2.1" > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > > > > MUMPS_VERSION_MAX_LEN > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > > > > MUMPS_VERSION_MAX_LEN 30 > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > > > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > > > > MUMPS_VERSION > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > > > > MUMPS_VERSION "5.2.1" > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > > > > MUMPS_VERSION_MAX_LEN > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > > > > MUMPS_VERSION_MAX_LEN 30 > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > > > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > > > > MUMPS_VERSION > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > > > > MUMPS_VERSION "5.2.1" > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > > > > MUMPS_VERSION_MAX_LEN > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > > > > MUMPS_VERSION_MAX_LEN 30 > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > > > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > > > > MUMPS_VERSION > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > > > > MUMPS_VERSION "5.2.1" > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > > > > MUMPS_VERSION_MAX_LEN > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > > > > MUMPS_VERSION_MAX_LEN 30 > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > > > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > > > > > > > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay > > wrote: > > > > > > > > > > > Also - what do you have for: > > > > > > > > > > > > grep MUMPS_VERSION > > > > > > > > > > > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > > > > > > > > > > Satish > > > > > > > > > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > > > > > > > > > > > > > please resend the logs > > > > > > > > > > > > > > Satish > > > > > > > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > > > > > > Same compiling error with --with-mumps-serial=1. > > > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < > > balay at mcs.anl.gov> > > > > > > wrote: > > > > > > > > > > > > > > > > > Use the additional option: -with-mumps-serial > > > > > > > > > > > > > > > > > > Satish > > > > > > > > > > > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > > > > > > > > > > Attached please find the configure.log. I use my own > > CMake. I > > > > have > > > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > > > > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < > > sam.guo at cd-adapco.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > I use pre-installed > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > > > > balay at mcs.anl.gov> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > >> Are you using --download-mumps or pre-installed mumps? > > If > > > > using > > > > > > > > > > >> pre-installed - try --download-mumps. > > > > > > > > > > >> > > > > > > > > > > >> If you still have issues - send us configure.log and > > > > make.log > > > > > > from the > > > > > > > > > > >> failed build. > > > > > > > > > > >> > > > > > > > > > > >> Satish > > > > > > > > > > >> > > > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > >> > > > > > > > > > > >> > Dear PETSc dev team, > > > > > > > > > > >> > I am compiling petsc 3.15.3 and got following > > compiling > > > > > > error > > > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: > > error: > > > > > > missing > > > > > > > > > binary > > > > > > > > > > >> > operator before token "(" > > > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > > > > > > > > > >> > Any idea what I did wrong? > > > > > > > > > > >> > > > > > > > > > > > >> > Thanks, > > > > > > > > > > >> > Sam > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From sam.guo at cd-adapco.com Wed Sep 1 14:19:53 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 1 Sep 2021 12:19:53 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: If we go back to the original compiling error, "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary operator before token "(" 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: > I believe I am using MUMPS since I have done following > (1) defined -DPETSC_HAVE_MUMPS, > (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > (3) link my pre-compiled MUMPS, and > (4) specifies following PETSc options > checkError(EPSGetST(eps, &st)); > checkError(STSetType(st, STSINVERT)); > //if(useShellMatrix) checkError(STSetMatMode(st, > ST_MATMODE_SHELL)); > checkError(STGetKSP(st, &ksp)); > checkError(KSPSetOperators(ksp, A, A)); > checkError(KSPSetType(ksp, KSPPREONLY)); > checkError(KSPGetPC(ksp, &pc)); > checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); > checkError(PCSetType(pc, PCCHOLESKY)); > checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); > checkError(PCFactorSetUpMatSolverType(pc)); > checkError(PetscOptionsSetValue(NULL, "-mat_mumps_icntl_13","1")); > > Another evidence I am using MUMPS is that If I skip (1)-(3) above, I got > the PETSc error saying that MUMPS is required. > > On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: > >> mumps is a fortran package - so best to specify fc. Any specific reason >> for needing to force '--with-fc=0'? >> >> The attached configure.log is not using mumps. >> >> Satish >> >> On Wed, 1 Sep 2021, Sam Guo wrote: >> >> > fc should not be required since I link PETSc with pre-compiled MUMPS. In >> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should >> not >> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my >> > pre-compiled MUMPS. >> > >> > I am able to make it work using PETSc 3.11.3. Attached please find the >> > cPETSc 3.11.3 onfigure.log PETSc. >> > >> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay wrote: >> > >> > > >> > > >> ******************************************************************************* >> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >> for >> > > details): >> > > >> > > >> ------------------------------------------------------------------------------- >> > > Package mumps requested requires Fortran but compiler turned off. >> > > >> > > >> ******************************************************************************* >> > > >> > > i.e remove '--with-fc=0' and rerun configure. >> > > >> > > Satish >> > > >> > > On Tue, 31 Aug 2021, Sam Guo wrote: >> > > >> > > > Attached please find the latest configure.log. >> > > > >> > > > grep MUMPS_VERSION >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >> > > > MUMPS_VERSION >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >> > > > MUMPS_VERSION "5.2.1" >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >> > > > MUMPS_VERSION_MAX_LEN >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >> > > > MUMPS_VERSION_MAX_LEN 30 >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >> > > > MUMPS_VERSION >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >> > > > MUMPS_VERSION "5.2.1" >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >> > > > MUMPS_VERSION_MAX_LEN >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >> > > > MUMPS_VERSION_MAX_LEN 30 >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >> > > > MUMPS_VERSION >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >> > > > MUMPS_VERSION "5.2.1" >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >> > > > MUMPS_VERSION_MAX_LEN >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >> > > > MUMPS_VERSION_MAX_LEN 30 >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >> > > > MUMPS_VERSION >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >> > > > MUMPS_VERSION "5.2.1" >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >> > > > MUMPS_VERSION_MAX_LEN >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >> > > > MUMPS_VERSION_MAX_LEN 30 >> > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > > > >> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay >> wrote: >> > > > >> > > > > Also - what do you have for: >> > > > > >> > > > > grep MUMPS_VERSION >> > > > > >> > > >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >> > > > > >> > > > > Satish >> > > > > >> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >> > > > > >> > > > > > please resend the logs >> > > > > > >> > > > > > Satish >> > > > > > >> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >> > > > > > >> > > > > > > Same compiling error with --with-mumps-serial=1. >> > > > > > > >> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >> balay at mcs.anl.gov> >> > > > > wrote: >> > > > > > > >> > > > > > > > Use the additional option: -with-mumps-serial >> > > > > > > > >> > > > > > > > Satish >> > > > > > > > >> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >> > > > > > > > >> > > > > > > > > Attached please find the configure.log. I use my own >> CMake. I >> > > have >> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >> > > > > > > > > >> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >> sam.guo at cd-adapco.com >> > > > >> > > > > wrote: >> > > > > > > > > >> > > > > > > > > > I use pre-installed >> > > > > > > > > > >> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >> > > balay at mcs.anl.gov> >> > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > >> >> > > > > > > > > >> Are you using --download-mumps or pre-installed mumps? >> If >> > > using >> > > > > > > > > >> pre-installed - try --download-mumps. >> > > > > > > > > >> >> > > > > > > > > >> If you still have issues - send us configure.log and >> > > make.log >> > > > > from the >> > > > > > > > > >> failed build. >> > > > > > > > > >> >> > > > > > > > > >> Satish >> > > > > > > > > >> >> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >> > > > > > > > > >> >> > > > > > > > > >> > Dear PETSc dev team, >> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following >> compiling >> > > > > error >> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >> error: >> > > > > missing >> > > > > > > > binary >> > > > > > > > > >> > operator before token "(" >> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >> > > > > > > > > >> > Any idea what I did wrong? >> > > > > > > > > >> > >> > > > > > > > > >> > Thanks, >> > > > > > > > > >> > Sam >> > > > > > > > > >> > >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > >> > > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Wed Sep 1 14:22:29 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 1 Sep 2021 12:22:29 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: My process only works for PTESc 3.11.3, not 3.15.3 and that's why I started this email thread. On Wed, Sep 1, 2021 at 12:19 PM Sam Guo wrote: > If we go back to the original compiling error, > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > operator before token "(" > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: > >> I believe I am using MUMPS since I have done following >> (1) defined -DPETSC_HAVE_MUMPS, >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >> (3) link my pre-compiled MUMPS, and >> (4) specifies following PETSc options >> checkError(EPSGetST(eps, &st)); >> checkError(STSetType(st, STSINVERT)); >> //if(useShellMatrix) checkError(STSetMatMode(st, >> ST_MATMODE_SHELL)); >> checkError(STGetKSP(st, &ksp)); >> checkError(KSPSetOperators(ksp, A, A)); >> checkError(KSPSetType(ksp, KSPPREONLY)); >> checkError(KSPGetPC(ksp, &pc)); >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); >> checkError(PCSetType(pc, PCCHOLESKY)); >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); >> checkError(PCFactorSetUpMatSolverType(pc)); >> checkError(PetscOptionsSetValue(NULL, "-mat_mumps_icntl_13","1")); >> >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I got >> the PETSc error saying that MUMPS is required. >> >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: >> >>> mumps is a fortran package - so best to specify fc. Any specific reason >>> for needing to force '--with-fc=0'? >>> >>> The attached configure.log is not using mumps. >>> >>> Satish >>> >>> On Wed, 1 Sep 2021, Sam Guo wrote: >>> >>> > fc should not be required since I link PETSc with pre-compiled MUMPS. >>> In >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should >>> not >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my >>> > pre-compiled MUMPS. >>> > >>> > I am able to make it work using PETSc 3.11.3. Attached please find the >>> > cPETSc 3.11.3 onfigure.log PETSc. >>> > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay >>> wrote: >>> > >>> > > >>> > > >>> ******************************************************************************* >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>> configure.log for >>> > > details): >>> > > >>> > > >>> ------------------------------------------------------------------------------- >>> > > Package mumps requested requires Fortran but compiler turned off. >>> > > >>> > > >>> ******************************************************************************* >>> > > >>> > > i.e remove '--with-fc=0' and rerun configure. >>> > > >>> > > Satish >>> > > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: >>> > > >>> > > > Attached please find the latest configure.log. >>> > > > >>> > > > grep MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>> > > > MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>> > > > MUMPS_VERSION "5.2.1" >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>> > > > MUMPS_VERSION_MAX_LEN >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>> > > > MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>> > > > MUMPS_VERSION "5.2.1" >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>> > > > MUMPS_VERSION_MAX_LEN >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>> > > > MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>> > > > MUMPS_VERSION "5.2.1" >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>> > > > MUMPS_VERSION_MAX_LEN >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>> > > > MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>> > > > MUMPS_VERSION "5.2.1" >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>> > > > MUMPS_VERSION_MAX_LEN >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > > > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay >>> wrote: >>> > > > >>> > > > > Also - what do you have for: >>> > > > > >>> > > > > grep MUMPS_VERSION >>> > > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>> > > > > >>> > > > > Satish >>> > > > > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >>> > > > > >>> > > > > > please resend the logs >>> > > > > > >>> > > > > > Satish >>> > > > > > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>> > > > > > >>> > > > > > > Same compiling error with --with-mumps-serial=1. >>> > > > > > > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >>> balay at mcs.anl.gov> >>> > > > > wrote: >>> > > > > > > >>> > > > > > > > Use the additional option: -with-mumps-serial >>> > > > > > > > >>> > > > > > > > Satish >>> > > > > > > > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>> > > > > > > > >>> > > > > > > > > Attached please find the configure.log. I use my own >>> CMake. I >>> > > have >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >>> > > > > > > > > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >>> sam.guo at cd-adapco.com >>> > > > >>> > > > > wrote: >>> > > > > > > > > >>> > > > > > > > > > I use pre-installed >>> > > > > > > > > > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >>> > > balay at mcs.anl.gov> >>> > > > > > > > wrote: >>> > > > > > > > > > >>> > > > > > > > > >> >>> > > > > > > > > >> Are you using --download-mumps or pre-installed >>> mumps? If >>> > > using >>> > > > > > > > > >> pre-installed - try --download-mumps. >>> > > > > > > > > >> >>> > > > > > > > > >> If you still have issues - send us configure.log and >>> > > make.log >>> > > > > from the >>> > > > > > > > > >> failed build. >>> > > > > > > > > >> >>> > > > > > > > > >> Satish >>> > > > > > > > > >> >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >>> > > > > > > > > >> >>> > > > > > > > > >> > Dear PETSc dev team, >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following >>> compiling >>> > > > > error >>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >>> error: >>> > > > > missing >>> > > > > > > > binary >>> > > > > > > > > >> > operator before token "(" >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >>> > > > > > > > > >> > Any idea what I did wrong? >>> > > > > > > > > >> > >>> > > > > > > > > >> > Thanks, >>> > > > > > > > > >> > Sam >>> > > > > > > > > >> > >>> > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > > >>> > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Sep 1 14:26:52 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Sep 2021 14:26:52 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: <6db0cfdc-5250-6cf4-350-84bd08b4ec@mcs.anl.gov> Well - then you refuse to follow our installation instructions. If you have your own hakey way of installing things - you can spend time debugging your process - and fixing things. [can't expect us to fix problems that your process creates. Just because it worked before for you doesn't mean its a petsc issue that we should put effort into debugging and fixing] Satish On Wed, 1 Sep 2021, Sam Guo wrote: > My process only works for PTESc 3.11.3, not 3.15.3 and that's why I started > this email thread. > > On Wed, Sep 1, 2021 at 12:19 PM Sam Guo wrote: > > > If we go back to the original compiling error, > > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > > operator before token "(" > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" > > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. > > > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: > > > >> I believe I am using MUMPS since I have done following > >> (1) defined -DPETSC_HAVE_MUMPS, > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > >> (3) link my pre-compiled MUMPS, and > >> (4) specifies following PETSc options > >> checkError(EPSGetST(eps, &st)); > >> checkError(STSetType(st, STSINVERT)); > >> //if(useShellMatrix) checkError(STSetMatMode(st, > >> ST_MATMODE_SHELL)); > >> checkError(STGetKSP(st, &ksp)); > >> checkError(KSPSetOperators(ksp, A, A)); > >> checkError(KSPSetType(ksp, KSPPREONLY)); > >> checkError(KSPGetPC(ksp, &pc)); > >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); > >> checkError(PCSetType(pc, PCCHOLESKY)); > >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); > >> checkError(PCFactorSetUpMatSolverType(pc)); > >> checkError(PetscOptionsSetValue(NULL, "-mat_mumps_icntl_13","1")); > >> > >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I got > >> the PETSc error saying that MUMPS is required. > >> > >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: > >> > >>> mumps is a fortran package - so best to specify fc. Any specific reason > >>> for needing to force '--with-fc=0'? > >>> > >>> The attached configure.log is not using mumps. > >>> > >>> Satish > >>> > >>> On Wed, 1 Sep 2021, Sam Guo wrote: > >>> > >>> > fc should not be required since I link PETSc with pre-compiled MUMPS. > >>> In > >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should > >>> not > >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my > >>> > pre-compiled MUMPS. > >>> > > >>> > I am able to make it work using PETSc 3.11.3. Attached please find the > >>> > cPETSc 3.11.3 onfigure.log PETSc. > >>> > > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay > >>> wrote: > >>> > > >>> > > > >>> > > > >>> ******************************************************************************* > >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see > >>> configure.log for > >>> > > details): > >>> > > > >>> > > > >>> ------------------------------------------------------------------------------- > >>> > > Package mumps requested requires Fortran but compiler turned off. > >>> > > > >>> > > > >>> ******************************************************************************* > >>> > > > >>> > > i.e remove '--with-fc=0' and rerun configure. > >>> > > > >>> > > Satish > >>> > > > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: > >>> > > > >>> > > > Attached please find the latest configure.log. > >>> > > > > >>> > > > grep MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > >>> > > > MUMPS_VERSION "5.2.1" > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION_MAX_LEN > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > >>> > > > MUMPS_VERSION_MAX_LEN 30 > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > >>> > > > MUMPS_VERSION "5.2.1" > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION_MAX_LEN > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > >>> > > > MUMPS_VERSION_MAX_LEN 30 > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > >>> > > > MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > >>> > > > MUMPS_VERSION "5.2.1" > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > >>> > > > MUMPS_VERSION_MAX_LEN > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > >>> > > > MUMPS_VERSION_MAX_LEN 30 > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > >>> > > > MUMPS_VERSION "5.2.1" > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION_MAX_LEN > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > >>> > > > MUMPS_VERSION_MAX_LEN 30 > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>> > > > > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay > >>> wrote: > >>> > > > > >>> > > > > Also - what do you have for: > >>> > > > > > >>> > > > > grep MUMPS_VERSION > >>> > > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > >>> > > > > > >>> > > > > Satish > >>> > > > > > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > >>> > > > > > >>> > > > > > please resend the logs > >>> > > > > > > >>> > > > > > Satish > >>> > > > > > > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > >>> > > > > > > >>> > > > > > > Same compiling error with --with-mumps-serial=1. > >>> > > > > > > > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < > >>> balay at mcs.anl.gov> > >>> > > > > wrote: > >>> > > > > > > > >>> > > > > > > > Use the additional option: -with-mumps-serial > >>> > > > > > > > > >>> > > > > > > > Satish > >>> > > > > > > > > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > >>> > > > > > > > > >>> > > > > > > > > Attached please find the configure.log. I use my own > >>> CMake. I > >>> > > have > >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > >>> > > > > > > > > > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < > >>> sam.guo at cd-adapco.com > >>> > > > > >>> > > > > wrote: > >>> > > > > > > > > > >>> > > > > > > > > > I use pre-installed > >>> > > > > > > > > > > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > >>> > > balay at mcs.anl.gov> > >>> > > > > > > > wrote: > >>> > > > > > > > > > > >>> > > > > > > > > >> > >>> > > > > > > > > >> Are you using --download-mumps or pre-installed > >>> mumps? If > >>> > > using > >>> > > > > > > > > >> pre-installed - try --download-mumps. > >>> > > > > > > > > >> > >>> > > > > > > > > >> If you still have issues - send us configure.log and > >>> > > make.log > >>> > > > > from the > >>> > > > > > > > > >> failed build. > >>> > > > > > > > > >> > >>> > > > > > > > > >> Satish > >>> > > > > > > > > >> > >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > >>> > > > > > > > > >> > >>> > > > > > > > > >> > Dear PETSc dev team, > >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following > >>> compiling > >>> > > > > error > >>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: > >>> error: > >>> > > > > missing > >>> > > > > > > > binary > >>> > > > > > > > > >> > operator before token "(" > >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > >>> > > > > > > > > >> > Any idea what I did wrong? > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > Thanks, > >>> > > > > > > > > >> > Sam > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > >>> > >>> > From sam.guo at cd-adapco.com Wed Sep 1 14:26:48 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 1 Sep 2021 12:26:48 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: For PETSc 3.15.3, if I don't include mat/impls/aij/mpi/mumps/mumps.c, I have no compiling error. But I need it for using MUMPS. It is a compiling error rather than linking error. On Wed, Sep 1, 2021 at 12:22 PM Sam Guo wrote: > My process only works for PTESc 3.11.3, not 3.15.3 and that's why I > started this email thread. > > On Wed, Sep 1, 2021 at 12:19 PM Sam Guo wrote: > >> If we go back to the original compiling error, >> "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary >> operator before token "(" >> 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" >> I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. >> >> On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: >> >>> I believe I am using MUMPS since I have done following >>> (1) defined -DPETSC_HAVE_MUMPS, >>> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >>> (3) link my pre-compiled MUMPS, and >>> (4) specifies following PETSc options >>> checkError(EPSGetST(eps, &st)); >>> checkError(STSetType(st, STSINVERT)); >>> //if(useShellMatrix) checkError(STSetMatMode(st, >>> ST_MATMODE_SHELL)); >>> checkError(STGetKSP(st, &ksp)); >>> checkError(KSPSetOperators(ksp, A, A)); >>> checkError(KSPSetType(ksp, KSPPREONLY)); >>> checkError(KSPGetPC(ksp, &pc)); >>> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); >>> checkError(PCSetType(pc, PCCHOLESKY)); >>> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); >>> checkError(PCFactorSetUpMatSolverType(pc)); >>> checkError(PetscOptionsSetValue(NULL, >>> "-mat_mumps_icntl_13","1")); >>> >>> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I got >>> the PETSc error saying that MUMPS is required. >>> >>> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: >>> >>>> mumps is a fortran package - so best to specify fc. Any specific reason >>>> for needing to force '--with-fc=0'? >>>> >>>> The attached configure.log is not using mumps. >>>> >>>> Satish >>>> >>>> On Wed, 1 Sep 2021, Sam Guo wrote: >>>> >>>> > fc should not be required since I link PETSc with pre-compiled MUMPS. >>>> In >>>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial >>>> should not >>>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my >>>> > pre-compiled MUMPS. >>>> > >>>> > I am able to make it work using PETSc 3.11.3. Attached please find the >>>> > cPETSc 3.11.3 onfigure.log PETSc. >>>> > >>>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay >>>> wrote: >>>> > >>>> > > >>>> > > >>>> ******************************************************************************* >>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>> configure.log for >>>> > > details): >>>> > > >>>> > > >>>> ------------------------------------------------------------------------------- >>>> > > Package mumps requested requires Fortran but compiler turned off. >>>> > > >>>> > > >>>> ******************************************************************************* >>>> > > >>>> > > i.e remove '--with-fc=0' and rerun configure. >>>> > > >>>> > > Satish >>>> > > >>>> > > On Tue, 31 Aug 2021, Sam Guo wrote: >>>> > > >>>> > > > Attached please find the latest configure.log. >>>> > > > >>>> > > > grep MUMPS_VERSION >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>>> > > > MUMPS_VERSION >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>>> > > > MUMPS_VERSION "5.2.1" >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>>> > > > MUMPS_VERSION_MAX_LEN >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>>> > > > MUMPS_VERSION_MAX_LEN 30 >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>>> > > > MUMPS_VERSION >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>>> > > > MUMPS_VERSION "5.2.1" >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>>> > > > MUMPS_VERSION_MAX_LEN >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>>> > > > MUMPS_VERSION_MAX_LEN 30 >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>>> > > > MUMPS_VERSION >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>>> > > > MUMPS_VERSION "5.2.1" >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>>> > > > MUMPS_VERSION_MAX_LEN >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>>> > > > MUMPS_VERSION_MAX_LEN 30 >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>>> > > > MUMPS_VERSION >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>>> > > > MUMPS_VERSION "5.2.1" >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>>> > > > MUMPS_VERSION_MAX_LEN >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>>> > > > MUMPS_VERSION_MAX_LEN 30 >>>> > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>> > > > >>>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay >>>> wrote: >>>> > > > >>>> > > > > Also - what do you have for: >>>> > > > > >>>> > > > > grep MUMPS_VERSION >>>> > > > > >>>> > > >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>>> > > > > >>>> > > > > Satish >>>> > > > > >>>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >>>> > > > > >>>> > > > > > please resend the logs >>>> > > > > > >>>> > > > > > Satish >>>> > > > > > >>>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>>> > > > > > >>>> > > > > > > Same compiling error with --with-mumps-serial=1. >>>> > > > > > > >>>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >>>> balay at mcs.anl.gov> >>>> > > > > wrote: >>>> > > > > > > >>>> > > > > > > > Use the additional option: -with-mumps-serial >>>> > > > > > > > >>>> > > > > > > > Satish >>>> > > > > > > > >>>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>>> > > > > > > > >>>> > > > > > > > > Attached please find the configure.log. I use my own >>>> CMake. I >>>> > > have >>>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >>>> > > > > > > > > >>>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >>>> sam.guo at cd-adapco.com >>>> > > > >>>> > > > > wrote: >>>> > > > > > > > > >>>> > > > > > > > > > I use pre-installed >>>> > > > > > > > > > >>>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >>>> > > balay at mcs.anl.gov> >>>> > > > > > > > wrote: >>>> > > > > > > > > > >>>> > > > > > > > > >> >>>> > > > > > > > > >> Are you using --download-mumps or pre-installed >>>> mumps? If >>>> > > using >>>> > > > > > > > > >> pre-installed - try --download-mumps. >>>> > > > > > > > > >> >>>> > > > > > > > > >> If you still have issues - send us configure.log and >>>> > > make.log >>>> > > > > from the >>>> > > > > > > > > >> failed build. >>>> > > > > > > > > >> >>>> > > > > > > > > >> Satish >>>> > > > > > > > > >> >>>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >>>> > > > > > > > > >> >>>> > > > > > > > > >> > Dear PETSc dev team, >>>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following >>>> compiling >>>> > > > > error >>>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >>>> error: >>>> > > > > missing >>>> > > > > > > > binary >>>> > > > > > > > > >> > operator before token "(" >>>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >>>> > > > > > > > > >> > Any idea what I did wrong? >>>> > > > > > > > > >> > >>>> > > > > > > > > >> > Thanks, >>>> > > > > > > > > >> > Sam >>>> > > > > > > > > >> > >>>> > > > > > > > > >> >>>> > > > > > > > > >> >>>> > > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > >>>> > > > > >>>> > > > > >>>> > > > >>>> > > >>>> > > >>>> > >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 1 14:30:46 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 1 Sep 2021 15:30:46 -0400 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: On Wed, Sep 1, 2021 at 3:27 PM Sam Guo wrote: > For PETSc 3.15.3, if I don't include mat/impls/aij/mpi/mumps/mumps.c, I > have no compiling error. But I need it for using MUMPS. It is a compiling > error rather than linking error. > I will ask a different way: Can you run configure, but point it at your MUMPS installation? --with-mumps-dir= Thanks, Matt > On Wed, Sep 1, 2021 at 12:22 PM Sam Guo wrote: > >> My process only works for PTESc 3.11.3, not 3.15.3 and that's why I >> started this email thread. >> >> On Wed, Sep 1, 2021 at 12:19 PM Sam Guo wrote: >> >>> If we go back to the original compiling error, >>> "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary >>> operator before token "(" >>> 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" >>> I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. >>> >>> On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: >>> >>>> I believe I am using MUMPS since I have done following >>>> (1) defined -DPETSC_HAVE_MUMPS, >>>> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >>>> (3) link my pre-compiled MUMPS, and >>>> (4) specifies following PETSc options >>>> checkError(EPSGetST(eps, &st)); >>>> checkError(STSetType(st, STSINVERT)); >>>> //if(useShellMatrix) checkError(STSetMatMode(st, >>>> ST_MATMODE_SHELL)); >>>> checkError(STGetKSP(st, &ksp)); >>>> checkError(KSPSetOperators(ksp, A, A)); >>>> checkError(KSPSetType(ksp, KSPPREONLY)); >>>> checkError(KSPGetPC(ksp, &pc)); >>>> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); >>>> checkError(PCSetType(pc, PCCHOLESKY)); >>>> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); >>>> checkError(PCFactorSetUpMatSolverType(pc)); >>>> checkError(PetscOptionsSetValue(NULL, >>>> "-mat_mumps_icntl_13","1")); >>>> >>>> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I >>>> got the PETSc error saying that MUMPS is required. >>>> >>>> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: >>>> >>>>> mumps is a fortran package - so best to specify fc. Any specific >>>>> reason for needing to force '--with-fc=0'? >>>>> >>>>> The attached configure.log is not using mumps. >>>>> >>>>> Satish >>>>> >>>>> On Wed, 1 Sep 2021, Sam Guo wrote: >>>>> >>>>> > fc should not be required since I link PETSc with pre-compiled >>>>> MUMPS. In >>>>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial >>>>> should not >>>>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links >>>>> my >>>>> > pre-compiled MUMPS. >>>>> > >>>>> > I am able to make it work using PETSc 3.11.3. Attached please find >>>>> the >>>>> > cPETSc 3.11.3 onfigure.log PETSc. >>>>> > >>>>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay >>>>> wrote: >>>>> > >>>>> > > >>>>> > > >>>>> ******************************************************************************* >>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>> configure.log for >>>>> > > details): >>>>> > > >>>>> > > >>>>> ------------------------------------------------------------------------------- >>>>> > > Package mumps requested requires Fortran but compiler turned off. >>>>> > > >>>>> > > >>>>> ******************************************************************************* >>>>> > > >>>>> > > i.e remove '--with-fc=0' and rerun configure. >>>>> > > >>>>> > > Satish >>>>> > > >>>>> > > On Tue, 31 Aug 2021, Sam Guo wrote: >>>>> > > >>>>> > > > Attached please find the latest configure.log. >>>>> > > > >>>>> > > > grep MUMPS_VERSION >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>>>> > > > MUMPS_VERSION >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>>>> > > > MUMPS_VERSION "5.2.1" >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>>>> > > > MUMPS_VERSION_MAX_LEN >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>>>> > > > MUMPS_VERSION_MAX_LEN 30 >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >>>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>>>> > > > MUMPS_VERSION >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>>>> > > > MUMPS_VERSION "5.2.1" >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>>>> > > > MUMPS_VERSION_MAX_LEN >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>>>> > > > MUMPS_VERSION_MAX_LEN 30 >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >>>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>>>> > > > MUMPS_VERSION >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>>>> > > > MUMPS_VERSION "5.2.1" >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>>>> > > > MUMPS_VERSION_MAX_LEN >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>>>> > > > MUMPS_VERSION_MAX_LEN 30 >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >>>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>>>> > > > MUMPS_VERSION >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>>>> > > > MUMPS_VERSION "5.2.1" >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>>>> > > > MUMPS_VERSION_MAX_LEN >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>>>> > > > MUMPS_VERSION_MAX_LEN 30 >>>>> > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >>>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>>> > > > >>>>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay >>>>> wrote: >>>>> > > > >>>>> > > > > Also - what do you have for: >>>>> > > > > >>>>> > > > > grep MUMPS_VERSION >>>>> > > > > >>>>> > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>>>> > > > > >>>>> > > > > Satish >>>>> > > > > >>>>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >>>>> > > > > >>>>> > > > > > please resend the logs >>>>> > > > > > >>>>> > > > > > Satish >>>>> > > > > > >>>>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>>>> > > > > > >>>>> > > > > > > Same compiling error with --with-mumps-serial=1. >>>>> > > > > > > >>>>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >>>>> balay at mcs.anl.gov> >>>>> > > > > wrote: >>>>> > > > > > > >>>>> > > > > > > > Use the additional option: -with-mumps-serial >>>>> > > > > > > > >>>>> > > > > > > > Satish >>>>> > > > > > > > >>>>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>>>> > > > > > > > >>>>> > > > > > > > > Attached please find the configure.log. I use my own >>>>> CMake. I >>>>> > > have >>>>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >>>>> > > > > > > > > >>>>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >>>>> sam.guo at cd-adapco.com >>>>> > > > >>>>> > > > > wrote: >>>>> > > > > > > > > >>>>> > > > > > > > > > I use pre-installed >>>>> > > > > > > > > > >>>>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >>>>> > > balay at mcs.anl.gov> >>>>> > > > > > > > wrote: >>>>> > > > > > > > > > >>>>> > > > > > > > > >> >>>>> > > > > > > > > >> Are you using --download-mumps or pre-installed >>>>> mumps? If >>>>> > > using >>>>> > > > > > > > > >> pre-installed - try --download-mumps. >>>>> > > > > > > > > >> >>>>> > > > > > > > > >> If you still have issues - send us configure.log and >>>>> > > make.log >>>>> > > > > from the >>>>> > > > > > > > > >> failed build. >>>>> > > > > > > > > >> >>>>> > > > > > > > > >> Satish >>>>> > > > > > > > > >> >>>>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >>>>> > > > > > > > > >> >>>>> > > > > > > > > >> > Dear PETSc dev team, >>>>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following >>>>> compiling >>>>> > > > > error >>>>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >>>>> error: >>>>> > > > > missing >>>>> > > > > > > > binary >>>>> > > > > > > > > >> > operator before token "(" >>>>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >>>>> > > > > > > > > >> > Any idea what I did wrong? >>>>> > > > > > > > > >> > >>>>> > > > > > > > > >> > Thanks, >>>>> > > > > > > > > >> > Sam >>>>> > > > > > > > > >> > >>>>> > > > > > > > > >> >>>>> > > > > > > > > >> >>>>> > > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > > >>>>> > >>>>> >>>>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Sep 1 14:34:17 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Sep 2021 14:34:17 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: <66d7c45-413a-d122-5fbc-636340c13b5c@mcs.anl.gov> On Wed, 1 Sep 2021, Matthew Knepley wrote: > On Wed, Sep 1, 2021 at 3:27 PM Sam Guo wrote: > > > For PETSc 3.15.3, if I don't include mat/impls/aij/mpi/mumps/mumps.c, I > > have no compiling error. But I need it for using MUMPS. It is a compiling > > error rather than linking error. > > > > I will ask a different way: > > Can you run configure, but point it at your MUMPS installation? > > --with-mumps-dir= It won't overcome this issue: >>> Package mumps requested requires Fortran but compiler turned off. <<< Satish > > Thanks, > > Matt > > > > On Wed, Sep 1, 2021 at 12:22 PM Sam Guo wrote: > > > >> My process only works for PTESc 3.11.3, not 3.15.3 and that's why I > >> started this email thread. > >> > >> On Wed, Sep 1, 2021 at 12:19 PM Sam Guo wrote: > >> > >>> If we go back to the original compiling error, > >>> "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > >>> operator before token "(" > >>> 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" > >>> I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. > >>> > >>> On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: > >>> > >>>> I believe I am using MUMPS since I have done following > >>>> (1) defined -DPETSC_HAVE_MUMPS, > >>>> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > >>>> (3) link my pre-compiled MUMPS, and > >>>> (4) specifies following PETSc options > >>>> checkError(EPSGetST(eps, &st)); > >>>> checkError(STSetType(st, STSINVERT)); > >>>> //if(useShellMatrix) checkError(STSetMatMode(st, > >>>> ST_MATMODE_SHELL)); > >>>> checkError(STGetKSP(st, &ksp)); > >>>> checkError(KSPSetOperators(ksp, A, A)); > >>>> checkError(KSPSetType(ksp, KSPPREONLY)); > >>>> checkError(KSPGetPC(ksp, &pc)); > >>>> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); > >>>> checkError(PCSetType(pc, PCCHOLESKY)); > >>>> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); > >>>> checkError(PCFactorSetUpMatSolverType(pc)); > >>>> checkError(PetscOptionsSetValue(NULL, > >>>> "-mat_mumps_icntl_13","1")); > >>>> > >>>> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I > >>>> got the PETSc error saying that MUMPS is required. > >>>> > >>>> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: > >>>> > >>>>> mumps is a fortran package - so best to specify fc. Any specific > >>>>> reason for needing to force '--with-fc=0'? > >>>>> > >>>>> The attached configure.log is not using mumps. > >>>>> > >>>>> Satish > >>>>> > >>>>> On Wed, 1 Sep 2021, Sam Guo wrote: > >>>>> > >>>>> > fc should not be required since I link PETSc with pre-compiled > >>>>> MUMPS. In > >>>>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial > >>>>> should not > >>>>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links > >>>>> my > >>>>> > pre-compiled MUMPS. > >>>>> > > >>>>> > I am able to make it work using PETSc 3.11.3. Attached please find > >>>>> the > >>>>> > cPETSc 3.11.3 onfigure.log PETSc. > >>>>> > > >>>>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay > >>>>> wrote: > >>>>> > > >>>>> > > > >>>>> > > > >>>>> ******************************************************************************* > >>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see > >>>>> configure.log for > >>>>> > > details): > >>>>> > > > >>>>> > > > >>>>> ------------------------------------------------------------------------------- > >>>>> > > Package mumps requested requires Fortran but compiler turned off. > >>>>> > > > >>>>> > > > >>>>> ******************************************************************************* > >>>>> > > > >>>>> > > i.e remove '--with-fc=0' and rerun configure. > >>>>> > > > >>>>> > > Satish > >>>>> > > > >>>>> > > On Tue, 31 Aug 2021, Sam Guo wrote: > >>>>> > > > >>>>> > > > Attached please find the latest configure.log. > >>>>> > > > > >>>>> > > > grep MUMPS_VERSION > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > >>>>> > > > MUMPS_VERSION > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > >>>>> > > > MUMPS_VERSION "5.2.1" > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > >>>>> > > > MUMPS_VERSION_MAX_LEN > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > >>>>> > > > MUMPS_VERSION_MAX_LEN 30 > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > >>>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > >>>>> > > > MUMPS_VERSION > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > >>>>> > > > MUMPS_VERSION "5.2.1" > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > >>>>> > > > MUMPS_VERSION_MAX_LEN > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > >>>>> > > > MUMPS_VERSION_MAX_LEN 30 > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > >>>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > >>>>> > > > MUMPS_VERSION > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > >>>>> > > > MUMPS_VERSION "5.2.1" > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > >>>>> > > > MUMPS_VERSION_MAX_LEN > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > >>>>> > > > MUMPS_VERSION_MAX_LEN 30 > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > >>>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > >>>>> > > > MUMPS_VERSION > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > >>>>> > > > MUMPS_VERSION "5.2.1" > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > >>>>> > > > MUMPS_VERSION_MAX_LEN > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > >>>>> > > > MUMPS_VERSION_MAX_LEN 30 > >>>>> > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > >>>>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>>>> > > > > >>>>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay > >>>>> wrote: > >>>>> > > > > >>>>> > > > > Also - what do you have for: > >>>>> > > > > > >>>>> > > > > grep MUMPS_VERSION > >>>>> > > > > > >>>>> > > > >>>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > >>>>> > > > > > >>>>> > > > > Satish > >>>>> > > > > > >>>>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > >>>>> > > > > > >>>>> > > > > > please resend the logs > >>>>> > > > > > > >>>>> > > > > > Satish > >>>>> > > > > > > >>>>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > >>>>> > > > > > > >>>>> > > > > > > Same compiling error with --with-mumps-serial=1. > >>>>> > > > > > > > >>>>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < > >>>>> balay at mcs.anl.gov> > >>>>> > > > > wrote: > >>>>> > > > > > > > >>>>> > > > > > > > Use the additional option: -with-mumps-serial > >>>>> > > > > > > > > >>>>> > > > > > > > Satish > >>>>> > > > > > > > > >>>>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > >>>>> > > > > > > > > >>>>> > > > > > > > > Attached please find the configure.log. I use my own > >>>>> CMake. I > >>>>> > > have > >>>>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > >>>>> > > > > > > > > > >>>>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < > >>>>> sam.guo at cd-adapco.com > >>>>> > > > > >>>>> > > > > wrote: > >>>>> > > > > > > > > > >>>>> > > > > > > > > > I use pre-installed > >>>>> > > > > > > > > > > >>>>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > >>>>> > > balay at mcs.anl.gov> > >>>>> > > > > > > > wrote: > >>>>> > > > > > > > > > > >>>>> > > > > > > > > >> > >>>>> > > > > > > > > >> Are you using --download-mumps or pre-installed > >>>>> mumps? If > >>>>> > > using > >>>>> > > > > > > > > >> pre-installed - try --download-mumps. > >>>>> > > > > > > > > >> > >>>>> > > > > > > > > >> If you still have issues - send us configure.log and > >>>>> > > make.log > >>>>> > > > > from the > >>>>> > > > > > > > > >> failed build. > >>>>> > > > > > > > > >> > >>>>> > > > > > > > > >> Satish > >>>>> > > > > > > > > >> > >>>>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > >>>>> > > > > > > > > >> > >>>>> > > > > > > > > >> > Dear PETSc dev team, > >>>>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following > >>>>> compiling > >>>>> > > > > error > >>>>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: > >>>>> error: > >>>>> > > > > missing > >>>>> > > > > > > > binary > >>>>> > > > > > > > > >> > operator before token "(" > >>>>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > >>>>> > > > > > > > > >> > Any idea what I did wrong? > >>>>> > > > > > > > > >> > > >>>>> > > > > > > > > >> > Thanks, > >>>>> > > > > > > > > >> > Sam > >>>>> > > > > > > > > >> > > >>>>> > > > > > > > > >> > >>>>> > > > > > > > > >> > >>>>> > > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > > >>>>> > > >>>>> > >>>>> > > From junchao.zhang at gmail.com Wed Sep 1 14:46:20 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 1 Sep 2021 14:46:20 -0500 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: On Wed, Sep 1, 2021 at 2:20 PM Sam Guo wrote: > If we go back to the original compiling error, > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > operator before token "(" > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. > When petsc is configured with mumps, you will find the macro PETSC_PKG_MUMPS_VERSION_GE(MAJOR,MINOR,SUBMINOR) in $PETSC_ARCH/include/petscpkg_version.h Sam, you can manually compile the failed file, mumps.c, with preprocessing, to see what is wrong in the expansion of the macro. > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: > >> I believe I am using MUMPS since I have done following >> (1) defined -DPETSC_HAVE_MUMPS, >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >> (3) link my pre-compiled MUMPS, and >> (4) specifies following PETSc options >> checkError(EPSGetST(eps, &st)); >> checkError(STSetType(st, STSINVERT)); >> //if(useShellMatrix) checkError(STSetMatMode(st, >> ST_MATMODE_SHELL)); >> checkError(STGetKSP(st, &ksp)); >> checkError(KSPSetOperators(ksp, A, A)); >> checkError(KSPSetType(ksp, KSPPREONLY)); >> checkError(KSPGetPC(ksp, &pc)); >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); >> checkError(PCSetType(pc, PCCHOLESKY)); >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); >> checkError(PCFactorSetUpMatSolverType(pc)); >> checkError(PetscOptionsSetValue(NULL, "-mat_mumps_icntl_13","1")); >> >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I got >> the PETSc error saying that MUMPS is required. >> >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: >> >>> mumps is a fortran package - so best to specify fc. Any specific reason >>> for needing to force '--with-fc=0'? >>> >>> The attached configure.log is not using mumps. >>> >>> Satish >>> >>> On Wed, 1 Sep 2021, Sam Guo wrote: >>> >>> > fc should not be required since I link PETSc with pre-compiled MUMPS. >>> In >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should >>> not >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my >>> > pre-compiled MUMPS. >>> > >>> > I am able to make it work using PETSc 3.11.3. Attached please find the >>> > cPETSc 3.11.3 onfigure.log PETSc. >>> > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay >>> wrote: >>> > >>> > > >>> > > >>> ******************************************************************************* >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>> configure.log for >>> > > details): >>> > > >>> > > >>> ------------------------------------------------------------------------------- >>> > > Package mumps requested requires Fortran but compiler turned off. >>> > > >>> > > >>> ******************************************************************************* >>> > > >>> > > i.e remove '--with-fc=0' and rerun configure. >>> > > >>> > > Satish >>> > > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: >>> > > >>> > > > Attached please find the latest configure.log. >>> > > > >>> > > > grep MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>> > > > MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>> > > > MUMPS_VERSION "5.2.1" >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>> > > > MUMPS_VERSION_MAX_LEN >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>> > > > MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>> > > > MUMPS_VERSION "5.2.1" >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>> > > > MUMPS_VERSION_MAX_LEN >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>> > > > MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>> > > > MUMPS_VERSION "5.2.1" >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>> > > > MUMPS_VERSION_MAX_LEN >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>> > > > MUMPS_VERSION >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>> > > > MUMPS_VERSION "5.2.1" >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>> > > > MUMPS_VERSION_MAX_LEN >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > > > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay >>> wrote: >>> > > > >>> > > > > Also - what do you have for: >>> > > > > >>> > > > > grep MUMPS_VERSION >>> > > > > >>> > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>> > > > > >>> > > > > Satish >>> > > > > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >>> > > > > >>> > > > > > please resend the logs >>> > > > > > >>> > > > > > Satish >>> > > > > > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>> > > > > > >>> > > > > > > Same compiling error with --with-mumps-serial=1. >>> > > > > > > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >>> balay at mcs.anl.gov> >>> > > > > wrote: >>> > > > > > > >>> > > > > > > > Use the additional option: -with-mumps-serial >>> > > > > > > > >>> > > > > > > > Satish >>> > > > > > > > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>> > > > > > > > >>> > > > > > > > > Attached please find the configure.log. I use my own >>> CMake. I >>> > > have >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >>> > > > > > > > > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >>> sam.guo at cd-adapco.com >>> > > > >>> > > > > wrote: >>> > > > > > > > > >>> > > > > > > > > > I use pre-installed >>> > > > > > > > > > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >>> > > balay at mcs.anl.gov> >>> > > > > > > > wrote: >>> > > > > > > > > > >>> > > > > > > > > >> >>> > > > > > > > > >> Are you using --download-mumps or pre-installed >>> mumps? If >>> > > using >>> > > > > > > > > >> pre-installed - try --download-mumps. >>> > > > > > > > > >> >>> > > > > > > > > >> If you still have issues - send us configure.log and >>> > > make.log >>> > > > > from the >>> > > > > > > > > >> failed build. >>> > > > > > > > > >> >>> > > > > > > > > >> Satish >>> > > > > > > > > >> >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >>> > > > > > > > > >> >>> > > > > > > > > >> > Dear PETSc dev team, >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following >>> compiling >>> > > > > error >>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >>> error: >>> > > > > missing >>> > > > > > > > binary >>> > > > > > > > > >> > operator before token "(" >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >>> > > > > > > > > >> > Any idea what I did wrong? >>> > > > > > > > > >> > >>> > > > > > > > > >> > Thanks, >>> > > > > > > > > >> > Sam >>> > > > > > > > > >> > >>> > > > > > > > > >> >>> > > > > > > > > >> >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > > >>> > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Sep 1 14:52:06 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 1 Sep 2021 14:52:06 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> Message-ID: <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> Well the build process used here is: >> (1) defined -DPETSC_HAVE_MUMPS, >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c i.e configure is skipped [for mumps part] so PETSC_PKG_MUMPS_VERSION_GE etc are missing [hence this error] Satish On Wed, 1 Sep 2021, Junchao Zhang wrote: > On Wed, Sep 1, 2021 at 2:20 PM Sam Guo wrote: > > > If we go back to the original compiling error, > > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > > operator before token "(" > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" > > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. > > > When petsc is configured with mumps, you will find the macro > PETSC_PKG_MUMPS_VERSION_GE(MAJOR,MINOR,SUBMINOR) in > $PETSC_ARCH/include/petscpkg_version.h > Sam, you can manually compile the failed file, mumps.c, with preprocessing, > to see what is wrong in the expansion of the macro. > > > > > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: > > > >> I believe I am using MUMPS since I have done following > >> (1) defined -DPETSC_HAVE_MUMPS, > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > >> (3) link my pre-compiled MUMPS, and > >> (4) specifies following PETSc options > >> checkError(EPSGetST(eps, &st)); > >> checkError(STSetType(st, STSINVERT)); > >> //if(useShellMatrix) checkError(STSetMatMode(st, > >> ST_MATMODE_SHELL)); > >> checkError(STGetKSP(st, &ksp)); > >> checkError(KSPSetOperators(ksp, A, A)); > >> checkError(KSPSetType(ksp, KSPPREONLY)); > >> checkError(KSPGetPC(ksp, &pc)); > >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); > >> checkError(PCSetType(pc, PCCHOLESKY)); > >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); > >> checkError(PCFactorSetUpMatSolverType(pc)); > >> checkError(PetscOptionsSetValue(NULL, "-mat_mumps_icntl_13","1")); > >> > >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I got > >> the PETSc error saying that MUMPS is required. > >> > >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay wrote: > >> > >>> mumps is a fortran package - so best to specify fc. Any specific reason > >>> for needing to force '--with-fc=0'? > >>> > >>> The attached configure.log is not using mumps. > >>> > >>> Satish > >>> > >>> On Wed, 1 Sep 2021, Sam Guo wrote: > >>> > >>> > fc should not be required since I link PETSc with pre-compiled MUMPS. > >>> In > >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial should > >>> not > >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and links my > >>> > pre-compiled MUMPS. > >>> > > >>> > I am able to make it work using PETSc 3.11.3. Attached please find the > >>> > cPETSc 3.11.3 onfigure.log PETSc. > >>> > > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay > >>> wrote: > >>> > > >>> > > > >>> > > > >>> ******************************************************************************* > >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see > >>> configure.log for > >>> > > details): > >>> > > > >>> > > > >>> ------------------------------------------------------------------------------- > >>> > > Package mumps requested requires Fortran but compiler turned off. > >>> > > > >>> > > > >>> ******************************************************************************* > >>> > > > >>> > > i.e remove '--with-fc=0' and rerun configure. > >>> > > > >>> > > Satish > >>> > > > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: > >>> > > > >>> > > > Attached please find the latest configure.log. > >>> > > > > >>> > > > grep MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > >>> > > > MUMPS_VERSION "5.2.1" > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION_MAX_LEN > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > >>> > > > MUMPS_VERSION_MAX_LEN 30 > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > >>> > > > MUMPS_VERSION "5.2.1" > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION_MAX_LEN > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > >>> > > > MUMPS_VERSION_MAX_LEN 30 > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > >>> > > > MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > >>> > > > MUMPS_VERSION "5.2.1" > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > >>> > > > MUMPS_VERSION_MAX_LEN > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > >>> > > > MUMPS_VERSION_MAX_LEN 30 > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > >>> > > > MUMPS_VERSION "5.2.1" > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > >>> > > > MUMPS_VERSION_MAX_LEN > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > >>> > > > MUMPS_VERSION_MAX_LEN 30 > >>> > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > >>> > > > > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay > >>> wrote: > >>> > > > > >>> > > > > Also - what do you have for: > >>> > > > > > >>> > > > > grep MUMPS_VERSION > >>> > > > > > >>> > > > >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > >>> > > > > > >>> > > > > Satish > >>> > > > > > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > >>> > > > > > >>> > > > > > please resend the logs > >>> > > > > > > >>> > > > > > Satish > >>> > > > > > > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > >>> > > > > > > >>> > > > > > > Same compiling error with --with-mumps-serial=1. > >>> > > > > > > > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < > >>> balay at mcs.anl.gov> > >>> > > > > wrote: > >>> > > > > > > > >>> > > > > > > > Use the additional option: -with-mumps-serial > >>> > > > > > > > > >>> > > > > > > > Satish > >>> > > > > > > > > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > >>> > > > > > > > > >>> > > > > > > > > Attached please find the configure.log. I use my own > >>> CMake. I > >>> > > have > >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > >>> > > > > > > > > > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < > >>> sam.guo at cd-adapco.com > >>> > > > > >>> > > > > wrote: > >>> > > > > > > > > > >>> > > > > > > > > > I use pre-installed > >>> > > > > > > > > > > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > >>> > > balay at mcs.anl.gov> > >>> > > > > > > > wrote: > >>> > > > > > > > > > > >>> > > > > > > > > >> > >>> > > > > > > > > >> Are you using --download-mumps or pre-installed > >>> mumps? If > >>> > > using > >>> > > > > > > > > >> pre-installed - try --download-mumps. > >>> > > > > > > > > >> > >>> > > > > > > > > >> If you still have issues - send us configure.log and > >>> > > make.log > >>> > > > > from the > >>> > > > > > > > > >> failed build. > >>> > > > > > > > > >> > >>> > > > > > > > > >> Satish > >>> > > > > > > > > >> > >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > >>> > > > > > > > > >> > >>> > > > > > > > > >> > Dear PETSc dev team, > >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following > >>> compiling > >>> > > > > error > >>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: > >>> error: > >>> > > > > missing > >>> > > > > > > > binary > >>> > > > > > > > > >> > operator before token "(" > >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > >>> > > > > > > > > >> > Any idea what I did wrong? > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > Thanks, > >>> > > > > > > > > >> > Sam > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > >>> > > > > > > > > >> > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > >>> > >>> > From junchao.zhang at gmail.com Wed Sep 1 14:59:09 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 1 Sep 2021 14:59:09 -0500 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> Message-ID: On Wed, Sep 1, 2021 at 2:52 PM Satish Balay wrote: > Well the build process used here is: > > >> (1) defined -DPETSC_HAVE_MUMPS, > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > > > i.e configure is skipped [for mumps part] so PETSC_PKG_MUMPS_VERSION_GE > etc are missing [hence this error] > > Then, a hack for use of MUMPS 5.2.1 is at the beginning of mumps.c, add two lines #define PETSC_PKG_MUMPS_VERSION_GE(x,y,z) 0 #define PETSC_PKG_MUMPS_VERSION_LT(x,y,z) 1 > Satish > > On Wed, 1 Sep 2021, Junchao Zhang wrote: > > > On Wed, Sep 1, 2021 at 2:20 PM Sam Guo wrote: > > > > > If we go back to the original compiling error, > > > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > > > operator before token "(" > > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" > > > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. > > > > > When petsc is configured with mumps, you will find the macro > > PETSC_PKG_MUMPS_VERSION_GE(MAJOR,MINOR,SUBMINOR) in > > $PETSC_ARCH/include/petscpkg_version.h > > Sam, you can manually compile the failed file, mumps.c, with > preprocessing, > > to see what is wrong in the expansion of the macro. > > > > > > > > > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: > > > > > >> I believe I am using MUMPS since I have done following > > >> (1) defined -DPETSC_HAVE_MUMPS, > > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > > >> (3) link my pre-compiled MUMPS, and > > >> (4) specifies following PETSc options > > >> checkError(EPSGetST(eps, &st)); > > >> checkError(STSetType(st, STSINVERT)); > > >> //if(useShellMatrix) checkError(STSetMatMode(st, > > >> ST_MATMODE_SHELL)); > > >> checkError(STGetKSP(st, &ksp)); > > >> checkError(KSPSetOperators(ksp, A, A)); > > >> checkError(KSPSetType(ksp, KSPPREONLY)); > > >> checkError(KSPGetPC(ksp, &pc)); > > >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); > > >> checkError(PCSetType(pc, PCCHOLESKY)); > > >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); > > >> checkError(PCFactorSetUpMatSolverType(pc)); > > >> checkError(PetscOptionsSetValue(NULL, > "-mat_mumps_icntl_13","1")); > > >> > > >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I > got > > >> the PETSc error saying that MUMPS is required. > > >> > > >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay > wrote: > > >> > > >>> mumps is a fortran package - so best to specify fc. Any specific > reason > > >>> for needing to force '--with-fc=0'? > > >>> > > >>> The attached configure.log is not using mumps. > > >>> > > >>> Satish > > >>> > > >>> On Wed, 1 Sep 2021, Sam Guo wrote: > > >>> > > >>> > fc should not be required since I link PETSc with pre-compiled > MUMPS. > > >>> In > > >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial > should > > >>> not > > >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and > links my > > >>> > pre-compiled MUMPS. > > >>> > > > >>> > I am able to make it work using PETSc 3.11.3. Attached please find > the > > >>> > cPETSc 3.11.3 onfigure.log PETSc. > > >>> > > > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay > > >>> wrote: > > >>> > > > >>> > > > > >>> > > > > >>> > ******************************************************************************* > > >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see > > >>> configure.log for > > >>> > > details): > > >>> > > > > >>> > > > > >>> > ------------------------------------------------------------------------------- > > >>> > > Package mumps requested requires Fortran but compiler turned off. > > >>> > > > > >>> > > > > >>> > ******************************************************************************* > > >>> > > > > >>> > > i.e remove '--with-fc=0' and rerun configure. > > >>> > > > > >>> > > Satish > > >>> > > > > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: > > >>> > > > > >>> > > > Attached please find the latest configure.log. > > >>> > > > > > >>> > > > grep MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > >>> > > > MUMPS_VERSION "5.2.1" > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION_MAX_LEN > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > >>> > > > MUMPS_VERSION_MAX_LEN 30 > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > >>> > > > MUMPS_VERSION "5.2.1" > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION_MAX_LEN > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > >>> > > > MUMPS_VERSION_MAX_LEN 30 > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > >>> > > > MUMPS_VERSION "5.2.1" > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION_MAX_LEN > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > >>> > > > MUMPS_VERSION_MAX_LEN 30 > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > >>> > > > MUMPS_VERSION "5.2.1" > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION_MAX_LEN > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > >>> > > > MUMPS_VERSION_MAX_LEN 30 > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > >>> > > > > > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay < > balay at mcs.anl.gov> > > >>> wrote: > > >>> > > > > > >>> > > > > Also - what do you have for: > > >>> > > > > > > >>> > > > > grep MUMPS_VERSION > > >>> > > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > >>> > > > > > > >>> > > > > Satish > > >>> > > > > > > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > > >>> > > > > > > >>> > > > > > please resend the logs > > >>> > > > > > > > >>> > > > > > Satish > > >>> > > > > > > > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > >>> > > > > > > > >>> > > > > > > Same compiling error with --with-mumps-serial=1. > > >>> > > > > > > > > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < > > >>> balay at mcs.anl.gov> > > >>> > > > > wrote: > > >>> > > > > > > > > >>> > > > > > > > Use the additional option: -with-mumps-serial > > >>> > > > > > > > > > >>> > > > > > > > Satish > > >>> > > > > > > > > > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > >>> > > > > > > > > > >>> > > > > > > > > Attached please find the configure.log. I use my own > > >>> CMake. I > > >>> > > have > > >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > >>> > > > > > > > > > > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < > > >>> sam.guo at cd-adapco.com > > >>> > > > > > >>> > > > > wrote: > > >>> > > > > > > > > > > >>> > > > > > > > > > I use pre-installed > > >>> > > > > > > > > > > > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > > >>> > > balay at mcs.anl.gov> > > >>> > > > > > > > wrote: > > >>> > > > > > > > > > > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> Are you using --download-mumps or pre-installed > > >>> mumps? If > > >>> > > using > > >>> > > > > > > > > >> pre-installed - try --download-mumps. > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> If you still have issues - send us configure.log > and > > >>> > > make.log > > >>> > > > > from the > > >>> > > > > > > > > >> failed build. > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> Satish > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > Dear PETSc dev team, > > >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following > > >>> compiling > > >>> > > > > error > > >>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: > > >>> error: > > >>> > > > > missing > > >>> > > > > > > > binary > > >>> > > > > > > > > >> > operator before token "(" > > >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > >>> > > > > > > > > >> > Any idea what I did wrong? > > >>> > > > > > > > > >> > > > >>> > > > > > > > > >> > Thanks, > > >>> > > > > > > > > >> > Sam > > >>> > > > > > > > > >> > > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Wed Sep 1 15:02:32 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 1 Sep 2021 13:02:32 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> Message-ID: Hi Matt, I tried --with-mumps-dir but same error. Hi Junchao, That's a very good clue and suggestion. I looked petscpkg_version.h. It is empty as follows. I'll follow your suggestion and define those macros in mumps.c. #if !defined(INCLUDED_PETSCPKG_VERSION_H) #define INCLUDED_PETSCPKG_VERSION_H #endif Hi Satish, Yes, what I am doing is hacking but it is necessary since have own own mpi wrapper. Thank you all, Sam On Wed, Sep 1, 2021 at 12:52 PM Satish Balay wrote: > Well the build process used here is: > > >> (1) defined -DPETSC_HAVE_MUMPS, > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > > > i.e configure is skipped [for mumps part] so PETSC_PKG_MUMPS_VERSION_GE > etc are missing [hence this error] > > Satish > > On Wed, 1 Sep 2021, Junchao Zhang wrote: > > > On Wed, Sep 1, 2021 at 2:20 PM Sam Guo wrote: > > > > > If we go back to the original compiling error, > > > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > > > operator before token "(" > > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" > > > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. > > > > > When petsc is configured with mumps, you will find the macro > > PETSC_PKG_MUMPS_VERSION_GE(MAJOR,MINOR,SUBMINOR) in > > $PETSC_ARCH/include/petscpkg_version.h > > Sam, you can manually compile the failed file, mumps.c, with > preprocessing, > > to see what is wrong in the expansion of the macro. > > > > > > > > > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo wrote: > > > > > >> I believe I am using MUMPS since I have done following > > >> (1) defined -DPETSC_HAVE_MUMPS, > > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c > > >> (3) link my pre-compiled MUMPS, and > > >> (4) specifies following PETSc options > > >> checkError(EPSGetST(eps, &st)); > > >> checkError(STSetType(st, STSINVERT)); > > >> //if(useShellMatrix) checkError(STSetMatMode(st, > > >> ST_MATMODE_SHELL)); > > >> checkError(STGetKSP(st, &ksp)); > > >> checkError(KSPSetOperators(ksp, A, A)); > > >> checkError(KSPSetType(ksp, KSPPREONLY)); > > >> checkError(KSPGetPC(ksp, &pc)); > > >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); > > >> checkError(PCSetType(pc, PCCHOLESKY)); > > >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); > > >> checkError(PCFactorSetUpMatSolverType(pc)); > > >> checkError(PetscOptionsSetValue(NULL, > "-mat_mumps_icntl_13","1")); > > >> > > >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I > got > > >> the PETSc error saying that MUMPS is required. > > >> > > >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay > wrote: > > >> > > >>> mumps is a fortran package - so best to specify fc. Any specific > reason > > >>> for needing to force '--with-fc=0'? > > >>> > > >>> The attached configure.log is not using mumps. > > >>> > > >>> Satish > > >>> > > >>> On Wed, 1 Sep 2021, Sam Guo wrote: > > >>> > > >>> > fc should not be required since I link PETSc with pre-compiled > MUMPS. > > >>> In > > >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial > should > > >>> not > > >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and > links my > > >>> > pre-compiled MUMPS. > > >>> > > > >>> > I am able to make it work using PETSc 3.11.3. Attached please find > the > > >>> > cPETSc 3.11.3 onfigure.log PETSc. > > >>> > > > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay > > >>> wrote: > > >>> > > > >>> > > > > >>> > > > > >>> > ******************************************************************************* > > >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see > > >>> configure.log for > > >>> > > details): > > >>> > > > > >>> > > > > >>> > ------------------------------------------------------------------------------- > > >>> > > Package mumps requested requires Fortran but compiler turned off. > > >>> > > > > >>> > > > > >>> > ******************************************************************************* > > >>> > > > > >>> > > i.e remove '--with-fc=0' and rerun configure. > > >>> > > > > >>> > > Satish > > >>> > > > > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: > > >>> > > > > >>> > > > Attached please find the latest configure.log. > > >>> > > > > > >>> > > > grep MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > >>> > > > MUMPS_VERSION "5.2.1" > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION_MAX_LEN > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > > >>> > > > MUMPS_VERSION_MAX_LEN 30 > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > >>> > > > MUMPS_VERSION "5.2.1" > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION_MAX_LEN > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > > >>> > > > MUMPS_VERSION_MAX_LEN 30 > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > >>> > > > MUMPS_VERSION "5.2.1" > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION_MAX_LEN > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > > >>> > > > MUMPS_VERSION_MAX_LEN 30 > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > >>> > > > MUMPS_VERSION "5.2.1" > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > > >>> > > > MUMPS_VERSION_MAX_LEN > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > > >>> > > > MUMPS_VERSION_MAX_LEN 30 > > >>> > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > >>> > > > > > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay < > balay at mcs.anl.gov> > > >>> wrote: > > >>> > > > > > >>> > > > > Also - what do you have for: > > >>> > > > > > > >>> > > > > grep MUMPS_VERSION > > >>> > > > > > > >>> > > > > >>> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > >>> > > > > > > >>> > > > > Satish > > >>> > > > > > > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > > >>> > > > > > > >>> > > > > > please resend the logs > > >>> > > > > > > > >>> > > > > > Satish > > >>> > > > > > > > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > >>> > > > > > > > >>> > > > > > > Same compiling error with --with-mumps-serial=1. > > >>> > > > > > > > > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < > > >>> balay at mcs.anl.gov> > > >>> > > > > wrote: > > >>> > > > > > > > > >>> > > > > > > > Use the additional option: -with-mumps-serial > > >>> > > > > > > > > > >>> > > > > > > > Satish > > >>> > > > > > > > > > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > >>> > > > > > > > > > >>> > > > > > > > > Attached please find the configure.log. I use my own > > >>> CMake. I > > >>> > > have > > >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > >>> > > > > > > > > > > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < > > >>> sam.guo at cd-adapco.com > > >>> > > > > > >>> > > > > wrote: > > >>> > > > > > > > > > > >>> > > > > > > > > > I use pre-installed > > >>> > > > > > > > > > > > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < > > >>> > > balay at mcs.anl.gov> > > >>> > > > > > > > wrote: > > >>> > > > > > > > > > > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> Are you using --download-mumps or pre-installed > > >>> mumps? If > > >>> > > using > > >>> > > > > > > > > >> pre-installed - try --download-mumps. > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> If you still have issues - send us configure.log > and > > >>> > > make.log > > >>> > > > > from the > > >>> > > > > > > > > >> failed build. > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> Satish > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > Dear PETSc dev team, > > >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got following > > >>> compiling > > >>> > > > > error > > >>> > > > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: > > >>> error: > > >>> > > > > missing > > >>> > > > > > > > binary > > >>> > > > > > > > > >> > operator before token "(" > > >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > >>> > > > > > > > > >> > Any idea what I did wrong? > > >>> > > > > > > > > >> > > > >>> > > > > > > > > >> > Thanks, > > >>> > > > > > > > > >> > Sam > > >>> > > > > > > > > >> > > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Wed Sep 1 15:06:07 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 1 Sep 2021 13:06:07 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> Message-ID: Hi Junchao, Your suggestion works. Thanks a lot. BR, Sam On Wed, Sep 1, 2021 at 1:02 PM Sam Guo wrote: > Hi Matt, > I tried --with-mumps-dir but same error. > > Hi Junchao, > That's a very good clue and suggestion. I looked petscpkg_version.h. It > is empty as follows. I'll follow your suggestion and define those macros in > mumps.c. > > #if !defined(INCLUDED_PETSCPKG_VERSION_H) > #define INCLUDED_PETSCPKG_VERSION_H > > #endif > > Hi Satish, > Yes, what I am doing is hacking but it is necessary since have own own > mpi wrapper. > > Thank you all, > Sam > > On Wed, Sep 1, 2021 at 12:52 PM Satish Balay wrote: > >> Well the build process used here is: >> >> >> (1) defined -DPETSC_HAVE_MUMPS, >> >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >> >> >> i.e configure is skipped [for mumps part] so PETSC_PKG_MUMPS_VERSION_GE >> etc are missing [hence this error] >> >> Satish >> >> On Wed, 1 Sep 2021, Junchao Zhang wrote: >> >> > On Wed, Sep 1, 2021 at 2:20 PM Sam Guo wrote: >> > >> > > If we go back to the original compiling error, >> > > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing >> binary >> > > operator before token "(" >> > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" >> > > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. >> > > >> > When petsc is configured with mumps, you will find the macro >> > PETSC_PKG_MUMPS_VERSION_GE(MAJOR,MINOR,SUBMINOR) in >> > $PETSC_ARCH/include/petscpkg_version.h >> > Sam, you can manually compile the failed file, mumps.c, with >> preprocessing, >> > to see what is wrong in the expansion of the macro. >> > >> > >> > > >> > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo >> wrote: >> > > >> > >> I believe I am using MUMPS since I have done following >> > >> (1) defined -DPETSC_HAVE_MUMPS, >> > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >> > >> (3) link my pre-compiled MUMPS, and >> > >> (4) specifies following PETSc options >> > >> checkError(EPSGetST(eps, &st)); >> > >> checkError(STSetType(st, STSINVERT)); >> > >> //if(useShellMatrix) checkError(STSetMatMode(st, >> > >> ST_MATMODE_SHELL)); >> > >> checkError(STGetKSP(st, &ksp)); >> > >> checkError(KSPSetOperators(ksp, A, A)); >> > >> checkError(KSPSetType(ksp, KSPPREONLY)); >> > >> checkError(KSPGetPC(ksp, &pc)); >> > >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); >> > >> checkError(PCSetType(pc, PCCHOLESKY)); >> > >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); >> > >> checkError(PCFactorSetUpMatSolverType(pc)); >> > >> checkError(PetscOptionsSetValue(NULL, >> "-mat_mumps_icntl_13","1")); >> > >> >> > >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I >> got >> > >> the PETSc error saying that MUMPS is required. >> > >> >> > >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay >> wrote: >> > >> >> > >>> mumps is a fortran package - so best to specify fc. Any specific >> reason >> > >>> for needing to force '--with-fc=0'? >> > >>> >> > >>> The attached configure.log is not using mumps. >> > >>> >> > >>> Satish >> > >>> >> > >>> On Wed, 1 Sep 2021, Sam Guo wrote: >> > >>> >> > >>> > fc should not be required since I link PETSc with pre-compiled >> MUMPS. >> > >>> In >> > >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial >> should >> > >>> not >> > >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and >> links my >> > >>> > pre-compiled MUMPS. >> > >>> > >> > >>> > I am able to make it work using PETSc 3.11.3. Attached please >> find the >> > >>> > cPETSc 3.11.3 onfigure.log PETSc. >> > >>> > >> > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay >> > >>> wrote: >> > >>> > >> > >>> > > >> > >>> > > >> > >>> >> ******************************************************************************* >> > >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >> > >>> configure.log for >> > >>> > > details): >> > >>> > > >> > >>> > > >> > >>> >> ------------------------------------------------------------------------------- >> > >>> > > Package mumps requested requires Fortran but compiler turned >> off. >> > >>> > > >> > >>> > > >> > >>> >> ******************************************************************************* >> > >>> > > >> > >>> > > i.e remove '--with-fc=0' and rerun configure. >> > >>> > > >> > >>> > > Satish >> > >>> > > >> > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: >> > >>> > > >> > >>> > > > Attached please find the latest configure.log. >> > >>> > > > >> > >>> > > > grep MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >> > >>> > > > MUMPS_VERSION "5.2.1" >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION_MAX_LEN >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >> > >>> > > > MUMPS_VERSION "5.2.1" >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION_MAX_LEN >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >> > >>> > > > MUMPS_VERSION "5.2.1" >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION_MAX_LEN >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >> > >>> > > > MUMPS_VERSION "5.2.1" >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION_MAX_LEN >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > >>> > > > >> > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay < >> balay at mcs.anl.gov> >> > >>> wrote: >> > >>> > > > >> > >>> > > > > Also - what do you have for: >> > >>> > > > > >> > >>> > > > > grep MUMPS_VERSION >> > >>> > > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >> > >>> > > > > >> > >>> > > > > Satish >> > >>> > > > > >> > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >> > >>> > > > > >> > >>> > > > > > please resend the logs >> > >>> > > > > > >> > >>> > > > > > Satish >> > >>> > > > > > >> > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >> > >>> > > > > > >> > >>> > > > > > > Same compiling error with --with-mumps-serial=1. >> > >>> > > > > > > >> > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >> > >>> balay at mcs.anl.gov> >> > >>> > > > > wrote: >> > >>> > > > > > > >> > >>> > > > > > > > Use the additional option: -with-mumps-serial >> > >>> > > > > > > > >> > >>> > > > > > > > Satish >> > >>> > > > > > > > >> > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >> > >>> > > > > > > > >> > >>> > > > > > > > > Attached please find the configure.log. I use my own >> > >>> CMake. I >> > >>> > > have >> > >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >> > >>> > > > > > > > > >> > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >> > >>> sam.guo at cd-adapco.com >> > >>> > > > >> > >>> > > > > wrote: >> > >>> > > > > > > > > >> > >>> > > > > > > > > > I use pre-installed >> > >>> > > > > > > > > > >> > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >> > >>> > > balay at mcs.anl.gov> >> > >>> > > > > > > > wrote: >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> Are you using --download-mumps or pre-installed >> > >>> mumps? If >> > >>> > > using >> > >>> > > > > > > > > >> pre-installed - try --download-mumps. >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> If you still have issues - send us configure.log >> and >> > >>> > > make.log >> > >>> > > > > from the >> > >>> > > > > > > > > >> failed build. >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> Satish >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> > Dear PETSc dev team, >> > >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got >> following >> > >>> compiling >> > >>> > > > > error >> > >>> > > > > > > > > >> > >> petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >> > >>> error: >> > >>> > > > > missing >> > >>> > > > > > > > binary >> > >>> > > > > > > > > >> > operator before token "(" >> > >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >> > >>> > > > > > > > > >> > Any idea what I did wrong? >> > >>> > > > > > > > > >> > >> > >>> > > > > > > > > >> > Thanks, >> > >>> > > > > > > > > >> > Sam >> > >>> > > > > > > > > >> > >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >>> > > >> > >>> > >> > >>> >> > >>> >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 1 15:58:17 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 1 Sep 2021 16:58:17 -0400 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> Message-ID: On Wed, Sep 1, 2021 at 4:03 PM Sam Guo wrote: > Hi Matt, > I tried --with-mumps-dir but same error. > How can you build MUMPS without a Fortran compiler? And if you have one, why are you not telling PETSc about it? Thanks, Matt > Hi Junchao, > That's a very good clue and suggestion. I looked petscpkg_version.h. It > is empty as follows. I'll follow your suggestion and define those macros in > mumps.c. > > #if !defined(INCLUDED_PETSCPKG_VERSION_H) > #define INCLUDED_PETSCPKG_VERSION_H > > #endif > > Hi Satish, > Yes, what I am doing is hacking but it is necessary since have own own > mpi wrapper. > > Thank you all, > Sam > > On Wed, Sep 1, 2021 at 12:52 PM Satish Balay wrote: > >> Well the build process used here is: >> >> >> (1) defined -DPETSC_HAVE_MUMPS, >> >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >> >> >> i.e configure is skipped [for mumps part] so PETSC_PKG_MUMPS_VERSION_GE >> etc are missing [hence this error] >> >> Satish >> >> On Wed, 1 Sep 2021, Junchao Zhang wrote: >> >> > On Wed, Sep 1, 2021 at 2:20 PM Sam Guo wrote: >> > >> > > If we go back to the original compiling error, >> > > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing >> binary >> > > operator before token "(" >> > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" >> > > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. >> > > >> > When petsc is configured with mumps, you will find the macro >> > PETSC_PKG_MUMPS_VERSION_GE(MAJOR,MINOR,SUBMINOR) in >> > $PETSC_ARCH/include/petscpkg_version.h >> > Sam, you can manually compile the failed file, mumps.c, with >> preprocessing, >> > to see what is wrong in the expansion of the macro. >> > >> > >> > > >> > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo >> wrote: >> > > >> > >> I believe I am using MUMPS since I have done following >> > >> (1) defined -DPETSC_HAVE_MUMPS, >> > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >> > >> (3) link my pre-compiled MUMPS, and >> > >> (4) specifies following PETSc options >> > >> checkError(EPSGetST(eps, &st)); >> > >> checkError(STSetType(st, STSINVERT)); >> > >> //if(useShellMatrix) checkError(STSetMatMode(st, >> > >> ST_MATMODE_SHELL)); >> > >> checkError(STGetKSP(st, &ksp)); >> > >> checkError(KSPSetOperators(ksp, A, A)); >> > >> checkError(KSPSetType(ksp, KSPPREONLY)); >> > >> checkError(KSPGetPC(ksp, &pc)); >> > >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); >> > >> checkError(PCSetType(pc, PCCHOLESKY)); >> > >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); >> > >> checkError(PCFactorSetUpMatSolverType(pc)); >> > >> checkError(PetscOptionsSetValue(NULL, >> "-mat_mumps_icntl_13","1")); >> > >> >> > >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, I >> got >> > >> the PETSc error saying that MUMPS is required. >> > >> >> > >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay >> wrote: >> > >> >> > >>> mumps is a fortran package - so best to specify fc. Any specific >> reason >> > >>> for needing to force '--with-fc=0'? >> > >>> >> > >>> The attached configure.log is not using mumps. >> > >>> >> > >>> Satish >> > >>> >> > >>> On Wed, 1 Sep 2021, Sam Guo wrote: >> > >>> >> > >>> > fc should not be required since I link PETSc with pre-compiled >> MUMPS. >> > >>> In >> > >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial >> should >> > >>> not >> > >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and >> links my >> > >>> > pre-compiled MUMPS. >> > >>> > >> > >>> > I am able to make it work using PETSc 3.11.3. Attached please >> find the >> > >>> > cPETSc 3.11.3 onfigure.log PETSc. >> > >>> > >> > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay >> > >>> wrote: >> > >>> > >> > >>> > > >> > >>> > > >> > >>> >> ******************************************************************************* >> > >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >> > >>> configure.log for >> > >>> > > details): >> > >>> > > >> > >>> > > >> > >>> >> ------------------------------------------------------------------------------- >> > >>> > > Package mumps requested requires Fortran but compiler turned >> off. >> > >>> > > >> > >>> > > >> > >>> >> ******************************************************************************* >> > >>> > > >> > >>> > > i.e remove '--with-fc=0' and rerun configure. >> > >>> > > >> > >>> > > Satish >> > >>> > > >> > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: >> > >>> > > >> > >>> > > > Attached please find the latest configure.log. >> > >>> > > > >> > >>> > > > grep MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >> > >>> > > > MUMPS_VERSION "5.2.1" >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION_MAX_LEN >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >> > >>> > > > MUMPS_VERSION "5.2.1" >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION_MAX_LEN >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >> > >>> > > > MUMPS_VERSION "5.2.1" >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION_MAX_LEN >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >> > >>> > > > MUMPS_VERSION "5.2.1" >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >> > >>> > > > MUMPS_VERSION_MAX_LEN >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >> > >>> > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >> > >>> > > > >> > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay < >> balay at mcs.anl.gov> >> > >>> wrote: >> > >>> > > > >> > >>> > > > > Also - what do you have for: >> > >>> > > > > >> > >>> > > > > grep MUMPS_VERSION >> > >>> > > > > >> > >>> > > >> > >>> >> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >> > >>> > > > > >> > >>> > > > > Satish >> > >>> > > > > >> > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >> > >>> > > > > >> > >>> > > > > > please resend the logs >> > >>> > > > > > >> > >>> > > > > > Satish >> > >>> > > > > > >> > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >> > >>> > > > > > >> > >>> > > > > > > Same compiling error with --with-mumps-serial=1. >> > >>> > > > > > > >> > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >> > >>> balay at mcs.anl.gov> >> > >>> > > > > wrote: >> > >>> > > > > > > >> > >>> > > > > > > > Use the additional option: -with-mumps-serial >> > >>> > > > > > > > >> > >>> > > > > > > > Satish >> > >>> > > > > > > > >> > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >> > >>> > > > > > > > >> > >>> > > > > > > > > Attached please find the configure.log. I use my own >> > >>> CMake. I >> > >>> > > have >> > >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >> > >>> > > > > > > > > >> > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >> > >>> sam.guo at cd-adapco.com >> > >>> > > > >> > >>> > > > > wrote: >> > >>> > > > > > > > > >> > >>> > > > > > > > > > I use pre-installed >> > >>> > > > > > > > > > >> > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >> > >>> > > balay at mcs.anl.gov> >> > >>> > > > > > > > wrote: >> > >>> > > > > > > > > > >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> Are you using --download-mumps or pre-installed >> > >>> mumps? If >> > >>> > > using >> > >>> > > > > > > > > >> pre-installed - try --download-mumps. >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> If you still have issues - send us configure.log >> and >> > >>> > > make.log >> > >>> > > > > from the >> > >>> > > > > > > > > >> failed build. >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> Satish >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> > Dear PETSc dev team, >> > >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got >> following >> > >>> compiling >> > >>> > > > > error >> > >>> > > > > > > > > >> > >> petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >> > >>> error: >> > >>> > > > > missing >> > >>> > > > > > > > binary >> > >>> > > > > > > > > >> > operator before token "(" >> > >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >> > >>> > > > > > > > > >> > Any idea what I did wrong? >> > >>> > > > > > > > > >> > >> > >>> > > > > > > > > >> > Thanks, >> > >>> > > > > > > > > >> > Sam >> > >>> > > > > > > > > >> > >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> >> > >>> > > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > > >> > >>> > > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >>> > > >> > >>> > >> > >>> >> > >>> >> > >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Wed Sep 1 16:19:29 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Wed, 1 Sep 2021 14:19:29 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> Message-ID: I build MUMPS at the designated machine but my local machine does not have fortran compiler. On Wed, Sep 1, 2021 at 1:58 PM Matthew Knepley wrote: > On Wed, Sep 1, 2021 at 4:03 PM Sam Guo wrote: > >> Hi Matt, >> I tried --with-mumps-dir but same error. >> > > How can you build MUMPS without a Fortran compiler? And if you have one, > why are you not telling PETSc about it? > > Thanks, > > Matt > > >> Hi Junchao, >> That's a very good clue and suggestion. I looked petscpkg_version.h. >> It is empty as follows. I'll follow your suggestion and define those macros >> in mumps.c. >> >> #if !defined(INCLUDED_PETSCPKG_VERSION_H) >> #define INCLUDED_PETSCPKG_VERSION_H >> >> #endif >> >> Hi Satish, >> Yes, what I am doing is hacking but it is necessary since have own own >> mpi wrapper. >> >> Thank you all, >> Sam >> >> On Wed, Sep 1, 2021 at 12:52 PM Satish Balay wrote: >> >>> Well the build process used here is: >>> >>> >> (1) defined -DPETSC_HAVE_MUMPS, >>> >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >>> >>> >>> i.e configure is skipped [for mumps part] so PETSC_PKG_MUMPS_VERSION_GE >>> etc are missing [hence this error] >>> >>> Satish >>> >>> On Wed, 1 Sep 2021, Junchao Zhang wrote: >>> >>> > On Wed, Sep 1, 2021 at 2:20 PM Sam Guo wrote: >>> > >>> > > If we go back to the original compiling error, >>> > > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing >>> binary >>> > > operator before token "(" >>> > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" >>> > > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. >>> > > >>> > When petsc is configured with mumps, you will find the macro >>> > PETSC_PKG_MUMPS_VERSION_GE(MAJOR,MINOR,SUBMINOR) in >>> > $PETSC_ARCH/include/petscpkg_version.h >>> > Sam, you can manually compile the failed file, mumps.c, with >>> preprocessing, >>> > to see what is wrong in the expansion of the macro. >>> > >>> > >>> > > >>> > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo >>> wrote: >>> > > >>> > >> I believe I am using MUMPS since I have done following >>> > >> (1) defined -DPETSC_HAVE_MUMPS, >>> > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >>> > >> (3) link my pre-compiled MUMPS, and >>> > >> (4) specifies following PETSc options >>> > >> checkError(EPSGetST(eps, &st)); >>> > >> checkError(STSetType(st, STSINVERT)); >>> > >> //if(useShellMatrix) checkError(STSetMatMode(st, >>> > >> ST_MATMODE_SHELL)); >>> > >> checkError(STGetKSP(st, &ksp)); >>> > >> checkError(KSPSetOperators(ksp, A, A)); >>> > >> checkError(KSPSetType(ksp, KSPPREONLY)); >>> > >> checkError(KSPGetPC(ksp, &pc)); >>> > >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); >>> > >> checkError(PCSetType(pc, PCCHOLESKY)); >>> > >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); >>> > >> checkError(PCFactorSetUpMatSolverType(pc)); >>> > >> checkError(PetscOptionsSetValue(NULL, >>> "-mat_mumps_icntl_13","1")); >>> > >> >>> > >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, >>> I got >>> > >> the PETSc error saying that MUMPS is required. >>> > >> >>> > >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay >>> wrote: >>> > >> >>> > >>> mumps is a fortran package - so best to specify fc. Any specific >>> reason >>> > >>> for needing to force '--with-fc=0'? >>> > >>> >>> > >>> The attached configure.log is not using mumps. >>> > >>> >>> > >>> Satish >>> > >>> >>> > >>> On Wed, 1 Sep 2021, Sam Guo wrote: >>> > >>> >>> > >>> > fc should not be required since I link PETSc with pre-compiled >>> MUMPS. >>> > >>> In >>> > >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial >>> should >>> > >>> not >>> > >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and >>> links my >>> > >>> > pre-compiled MUMPS. >>> > >>> > >>> > >>> > I am able to make it work using PETSc 3.11.3. Attached please >>> find the >>> > >>> > cPETSc 3.11.3 onfigure.log PETSc. >>> > >>> > >>> > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay >>> > >>> wrote: >>> > >>> > >>> > >>> > > >>> > >>> > > >>> > >>> >>> ******************************************************************************* >>> > >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>> > >>> configure.log for >>> > >>> > > details): >>> > >>> > > >>> > >>> > > >>> > >>> >>> ------------------------------------------------------------------------------- >>> > >>> > > Package mumps requested requires Fortran but compiler turned >>> off. >>> > >>> > > >>> > >>> > > >>> > >>> >>> ******************************************************************************* >>> > >>> > > >>> > >>> > > i.e remove '--with-fc=0' and rerun configure. >>> > >>> > > >>> > >>> > > Satish >>> > >>> > > >>> > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: >>> > >>> > > >>> > >>> > > > Attached please find the latest configure.log. >>> > >>> > > > >>> > >>> > > > grep MUMPS_VERSION >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>> > >>> > > > MUMPS_VERSION >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>> > >>> > > > MUMPS_VERSION "5.2.1" >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>> > >>> > > > MUMPS_VERSION_MAX_LEN >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >>> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>> > >>> > > > MUMPS_VERSION >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>> > >>> > > > MUMPS_VERSION "5.2.1" >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>> > >>> > > > MUMPS_VERSION_MAX_LEN >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >>> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>> > >>> > > > MUMPS_VERSION >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>> > >>> > > > MUMPS_VERSION "5.2.1" >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>> > >>> > > > MUMPS_VERSION_MAX_LEN >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >>> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>> > >>> > > > MUMPS_VERSION >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>> > >>> > > > MUMPS_VERSION "5.2.1" >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>> > >>> > > > MUMPS_VERSION_MAX_LEN >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >>> > >>> > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >>> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>> > >>> > > > >>> > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay < >>> balay at mcs.anl.gov> >>> > >>> wrote: >>> > >>> > > > >>> > >>> > > > > Also - what do you have for: >>> > >>> > > > > >>> > >>> > > > > grep MUMPS_VERSION >>> > >>> > > > > >>> > >>> > > >>> > >>> >>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>> > >>> > > > > >>> > >>> > > > > Satish >>> > >>> > > > > >>> > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >>> > >>> > > > > >>> > >>> > > > > > please resend the logs >>> > >>> > > > > > >>> > >>> > > > > > Satish >>> > >>> > > > > > >>> > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>> > >>> > > > > > >>> > >>> > > > > > > Same compiling error with --with-mumps-serial=1. >>> > >>> > > > > > > >>> > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >>> > >>> balay at mcs.anl.gov> >>> > >>> > > > > wrote: >>> > >>> > > > > > > >>> > >>> > > > > > > > Use the additional option: -with-mumps-serial >>> > >>> > > > > > > > >>> > >>> > > > > > > > Satish >>> > >>> > > > > > > > >>> > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>> > >>> > > > > > > > >>> > >>> > > > > > > > > Attached please find the configure.log. I use my >>> own >>> > >>> CMake. I >>> > >>> > > have >>> > >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >>> > >>> > > > > > > > > >>> > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >>> > >>> sam.guo at cd-adapco.com >>> > >>> > > > >>> > >>> > > > > wrote: >>> > >>> > > > > > > > > >>> > >>> > > > > > > > > > I use pre-installed >>> > >>> > > > > > > > > > >>> > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >>> > >>> > > balay at mcs.anl.gov> >>> > >>> > > > > > > > wrote: >>> > >>> > > > > > > > > > >>> > >>> > > > > > > > > >> >>> > >>> > > > > > > > > >> Are you using --download-mumps or pre-installed >>> > >>> mumps? If >>> > >>> > > using >>> > >>> > > > > > > > > >> pre-installed - try --download-mumps. >>> > >>> > > > > > > > > >> >>> > >>> > > > > > > > > >> If you still have issues - send us >>> configure.log and >>> > >>> > > make.log >>> > >>> > > > > from the >>> > >>> > > > > > > > > >> failed build. >>> > >>> > > > > > > > > >> >>> > >>> > > > > > > > > >> Satish >>> > >>> > > > > > > > > >> >>> > >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >>> > >>> > > > > > > > > >> >>> > >>> > > > > > > > > >> > Dear PETSc dev team, >>> > >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got >>> following >>> > >>> compiling >>> > >>> > > > > error >>> > >>> > > > > > > > > >> > >>> petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >>> > >>> error: >>> > >>> > > > > missing >>> > >>> > > > > > > > binary >>> > >>> > > > > > > > > >> > operator before token "(" >>> > >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >>> > >>> > > > > > > > > >> > Any idea what I did wrong? >>> > >>> > > > > > > > > >> > >>> > >>> > > > > > > > > >> > Thanks, >>> > >>> > > > > > > > > >> > Sam >>> > >>> > > > > > > > > >> > >>> > >>> > > > > > > > > >> >>> > >>> > > > > > > > > >> >>> > >>> > > > > > > > > >>> > >>> > > > > > > > >>> > >>> > > > > > > > >>> > >>> > > > > > > >>> > >>> > > > > > >>> > >>> > > > > >>> > >>> > > > > >>> > >>> > > > >>> > >>> > > >>> > >>> > > >>> > >>> > >>> > >>> >>> > >>> >>> > >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 1 19:02:41 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 1 Sep 2021 20:02:41 -0400 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> <408d7d73-4da-8d73-97c-15e91855922@mcs.anl.gov> Message-ID: On Wed, Sep 1, 2021 at 5:19 PM Sam Guo wrote: > I build MUMPS at the designated machine but my local machine does not have > fortran compiler. > Can you run the configure there? THanks, Matt > On Wed, Sep 1, 2021 at 1:58 PM Matthew Knepley wrote: > >> On Wed, Sep 1, 2021 at 4:03 PM Sam Guo wrote: >> >>> Hi Matt, >>> I tried --with-mumps-dir but same error. >>> >> >> How can you build MUMPS without a Fortran compiler? And if you have one, >> why are you not telling PETSc about it? >> >> Thanks, >> >> Matt >> >> >>> Hi Junchao, >>> That's a very good clue and suggestion. I looked petscpkg_version.h. >>> It is empty as follows. I'll follow your suggestion and define those macros >>> in mumps.c. >>> >>> #if !defined(INCLUDED_PETSCPKG_VERSION_H) >>> #define INCLUDED_PETSCPKG_VERSION_H >>> >>> #endif >>> >>> Hi Satish, >>> Yes, what I am doing is hacking but it is necessary since have own >>> own mpi wrapper. >>> >>> Thank you all, >>> Sam >>> >>> On Wed, Sep 1, 2021 at 12:52 PM Satish Balay wrote: >>> >>>> Well the build process used here is: >>>> >>>> >> (1) defined -DPETSC_HAVE_MUMPS, >>>> >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >>>> >>>> >>>> i.e configure is skipped [for mumps part] so PETSC_PKG_MUMPS_VERSION_GE >>>> etc are missing [hence this error] >>>> >>>> Satish >>>> >>>> On Wed, 1 Sep 2021, Junchao Zhang wrote: >>>> >>>> > On Wed, Sep 1, 2021 at 2:20 PM Sam Guo wrote: >>>> > >>>> > > If we go back to the original compiling error, >>>> > > "petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing >>>> binary >>>> > > operator before token "(" >>>> > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)" >>>> > > I don't understand what PETSC_PKG_MUMPS_VERSION_GE(5,3,0) is doing. >>>> > > >>>> > When petsc is configured with mumps, you will find the macro >>>> > PETSC_PKG_MUMPS_VERSION_GE(MAJOR,MINOR,SUBMINOR) in >>>> > $PETSC_ARCH/include/petscpkg_version.h >>>> > Sam, you can manually compile the failed file, mumps.c, with >>>> preprocessing, >>>> > to see what is wrong in the expansion of the macro. >>>> > >>>> > >>>> > > >>>> > > On Wed, Sep 1, 2021 at 12:12 PM Sam Guo >>>> wrote: >>>> > > >>>> > >> I believe I am using MUMPS since I have done following >>>> > >> (1) defined -DPETSC_HAVE_MUMPS, >>>> > >> (2) compiles and links mat/impls/aij/mpi/mumps/mumps.c >>>> > >> (3) link my pre-compiled MUMPS, and >>>> > >> (4) specifies following PETSc options >>>> > >> checkError(EPSGetST(eps, &st)); >>>> > >> checkError(STSetType(st, STSINVERT)); >>>> > >> //if(useShellMatrix) checkError(STSetMatMode(st, >>>> > >> ST_MATMODE_SHELL)); >>>> > >> checkError(STGetKSP(st, &ksp)); >>>> > >> checkError(KSPSetOperators(ksp, A, A)); >>>> > >> checkError(KSPSetType(ksp, KSPPREONLY)); >>>> > >> checkError(KSPGetPC(ksp, &pc)); >>>> > >> checkError(MatSetOption(A, MAT_SPD, PETSC_TRUE)); >>>> > >> checkError(PCSetType(pc, PCCHOLESKY)); >>>> > >> checkError(PCFactorSetMatSolverType(pc, MATSOLVERMUMPS)); >>>> > >> checkError(PCFactorSetUpMatSolverType(pc)); >>>> > >> checkError(PetscOptionsSetValue(NULL, >>>> "-mat_mumps_icntl_13","1")); >>>> > >> >>>> > >> Another evidence I am using MUMPS is that If I skip (1)-(3) above, >>>> I got >>>> > >> the PETSc error saying that MUMPS is required. >>>> > >> >>>> > >> On Wed, Sep 1, 2021 at 12:00 PM Satish Balay >>>> wrote: >>>> > >> >>>> > >>> mumps is a fortran package - so best to specify fc. Any specific >>>> reason >>>> > >>> for needing to force '--with-fc=0'? >>>> > >>> >>>> > >>> The attached configure.log is not using mumps. >>>> > >>> >>>> > >>> Satish >>>> > >>> >>>> > >>> On Wed, 1 Sep 2021, Sam Guo wrote: >>>> > >>> >>>> > >>> > fc should not be required since I link PETSc with pre-compiled >>>> MUMPS. >>>> > >>> In >>>> > >>> > fact, --with-mumps-include --with-mumps-lib --with-mumps-serial >>>> should >>>> > >>> not >>>> > >>> > be required since my own CMake defines -DPETSC_HAVE_MUMPS and >>>> links my >>>> > >>> > pre-compiled MUMPS. >>>> > >>> > >>>> > >>> > I am able to make it work using PETSc 3.11.3. Attached please >>>> find the >>>> > >>> > cPETSc 3.11.3 onfigure.log PETSc. >>>> > >>> > >>>> > >>> > On Tue, Aug 31, 2021 at 4:47 PM Satish Balay >>> > >>>> > >>> wrote: >>>> > >>> > >>>> > >>> > > >>>> > >>> > > >>>> > >>> >>>> ******************************************************************************* >>>> > >>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>> > >>> configure.log for >>>> > >>> > > details): >>>> > >>> > > >>>> > >>> > > >>>> > >>> >>>> ------------------------------------------------------------------------------- >>>> > >>> > > Package mumps requested requires Fortran but compiler turned >>>> off. >>>> > >>> > > >>>> > >>> > > >>>> > >>> >>>> ******************************************************************************* >>>> > >>> > > >>>> > >>> > > i.e remove '--with-fc=0' and rerun configure. >>>> > >>> > > >>>> > >>> > > Satish >>>> > >>> > > >>>> > >>> > > On Tue, 31 Aug 2021, Sam Guo wrote: >>>> > >>> > > >>>> > >>> > > > Attached please find the latest configure.log. >>>> > >>> > > > >>>> > >>> > > > grep MUMPS_VERSION >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>>> > >>> > > > MUMPS_VERSION >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>>> > >>> > > > MUMPS_VERSION "5.2.1" >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef >>>> > >>> > > > MUMPS_VERSION_MAX_LEN >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define >>>> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: >>>> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>>> > >>> > > > MUMPS_VERSION >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>>> > >>> > > > MUMPS_VERSION "5.2.1" >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef >>>> > >>> > > > MUMPS_VERSION_MAX_LEN >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define >>>> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: >>>> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>>> > >>> > > > MUMPS_VERSION >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>>> > >>> > > > MUMPS_VERSION "5.2.1" >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef >>>> > >>> > > > MUMPS_VERSION_MAX_LEN >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define >>>> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: >>>> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>>> > >>> > > > MUMPS_VERSION >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>>> > >>> > > > MUMPS_VERSION "5.2.1" >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef >>>> > >>> > > > MUMPS_VERSION_MAX_LEN >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define >>>> > >>> > > > MUMPS_VERSION_MAX_LEN 30 >>>> > >>> > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: >>>> > >>> > > > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; >>>> > >>> > > > >>>> > >>> > > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay < >>>> balay at mcs.anl.gov> >>>> > >>> wrote: >>>> > >>> > > > >>>> > >>> > > > > Also - what do you have for: >>>> > >>> > > > > >>>> > >>> > > > > grep MUMPS_VERSION >>>> > >>> > > > > >>>> > >>> > > >>>> > >>> >>>> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h >>>> > >>> > > > > >>>> > >>> > > > > Satish >>>> > >>> > > > > >>>> > >>> > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: >>>> > >>> > > > > >>>> > >>> > > > > > please resend the logs >>>> > >>> > > > > > >>>> > >>> > > > > > Satish >>>> > >>> > > > > > >>>> > >>> > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>>> > >>> > > > > > >>>> > >>> > > > > > > Same compiling error with --with-mumps-serial=1. >>>> > >>> > > > > > > >>>> > >>> > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay < >>>> > >>> balay at mcs.anl.gov> >>>> > >>> > > > > wrote: >>>> > >>> > > > > > > >>>> > >>> > > > > > > > Use the additional option: -with-mumps-serial >>>> > >>> > > > > > > > >>>> > >>> > > > > > > > Satish >>>> > >>> > > > > > > > >>>> > >>> > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: >>>> > >>> > > > > > > > >>>> > >>> > > > > > > > > Attached please find the configure.log. I use my >>>> own >>>> > >>> CMake. I >>>> > >>> > > have >>>> > >>> > > > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. >>>> > >>> > > > > > > > > >>>> > >>> > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo < >>>> > >>> sam.guo at cd-adapco.com >>>> > >>> > > > >>>> > >>> > > > > wrote: >>>> > >>> > > > > > > > > >>>> > >>> > > > > > > > > > I use pre-installed >>>> > >>> > > > > > > > > > >>>> > >>> > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay < >>>> > >>> > > balay at mcs.anl.gov> >>>> > >>> > > > > > > > wrote: >>>> > >>> > > > > > > > > > >>>> > >>> > > > > > > > > >> >>>> > >>> > > > > > > > > >> Are you using --download-mumps or pre-installed >>>> > >>> mumps? If >>>> > >>> > > using >>>> > >>> > > > > > > > > >> pre-installed - try --download-mumps. >>>> > >>> > > > > > > > > >> >>>> > >>> > > > > > > > > >> If you still have issues - send us >>>> configure.log and >>>> > >>> > > make.log >>>> > >>> > > > > from the >>>> > >>> > > > > > > > > >> failed build. >>>> > >>> > > > > > > > > >> >>>> > >>> > > > > > > > > >> Satish >>>> > >>> > > > > > > > > >> >>>> > >>> > > > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: >>>> > >>> > > > > > > > > >> >>>> > >>> > > > > > > > > >> > Dear PETSc dev team, >>>> > >>> > > > > > > > > >> > I am compiling petsc 3.15.3 and got >>>> following >>>> > >>> compiling >>>> > >>> > > > > error >>>> > >>> > > > > > > > > >> > >>>> petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: >>>> > >>> error: >>>> > >>> > > > > missing >>>> > >>> > > > > > > > binary >>>> > >>> > > > > > > > > >> > operator before token "(" >>>> > >>> > > > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >>>> > >>> > > > > > > > > >> > Any idea what I did wrong? >>>> > >>> > > > > > > > > >> > >>>> > >>> > > > > > > > > >> > Thanks, >>>> > >>> > > > > > > > > >> > Sam >>>> > >>> > > > > > > > > >> > >>>> > >>> > > > > > > > > >> >>>> > >>> > > > > > > > > >> >>>> > >>> > > > > > > > > >>>> > >>> > > > > > > > >>>> > >>> > > > > > > > >>>> > >>> > > > > > > >>>> > >>> > > > > > >>>> > >>> > > > > >>>> > >>> > > > > >>>> > >>> > > > >>>> > >>> > > >>>> > >>> > > >>>> > >>> > >>>> > >>> >>>> > >>> >>>> > >>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Sep 2 00:27:17 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 2 Sep 2021 00:27:17 -0500 (CDT) Subject: [petsc-users] petsc-3.15.4 now available Message-ID: Dear PETSc users, The patch release petsc-3.15.4 is now available for download. https://petsc.org/release/download/ Satish From numbersixvs at gmail.com Thu Sep 2 07:07:20 2021 From: numbersixvs at gmail.com (Viktor Nazdrachev) Date: Thu, 2 Sep 2021 15:07:20 +0300 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: <7EFBB20A-CB8A-47BA-BDD8-4E0BD43BBC31@joliv.et> References: <7EFBB20A-CB8A-47BA-BDD8-4E0BD43BBC31@joliv.et> Message-ID: Hello, Pierre! Thank you for your response! I attached log files (txt files with convergence behavior and RAM usage log in separate txt files) and resulting table with convergence investigation data(xls). Data for main non-regular grid with 500K cells and heterogeneous properties are in 500K folder, whereas data for simple uniform 125K cells grid with constant properties are in 125K folder. >Dear Viktor, > >>* On 1 Sep 2021, at 10:42 AM, **?????????* *??????** <**numbersixvs at gmail.com **> >wrote:* *>*> *>*>* Dear all,* *>*> *>*>* I have a 3D elasticity problem with heterogeneous properties. There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).* *>*> *>*>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver* *>*> *>*Block Krylov solvers are (most often) only useful if you have multiple right-hand sides, e.g., in the context of elasticity, multiple loadings. Is that really the case? If not, you may as well stick to ?standard? CG instead of the breakdown-free block (BFB) variant. *> * In that case only single right-hand side is utilized, so I switched to ?standard? cg solver (-ksp_hpddm_type cg), but I noticed the interesting convergence behavior. For non-regular grid with 500K cells and heterogeneous properties CG solver converged with 1 iteration (log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt), but for more simple uniform grid with 125K cells and homogeneous properties CG solves linear system successfully(log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt). BFBCG solver works properly for both grids. *>*>* and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.* *>*> *>*>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.* >*> * *>*I?m surprised that GAMG is converging so slowly. What do you mean by "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse level solver? *>* Sorry for misleading, ICC is used only for BJACOBI preconditioner, no ICC for GAMG. *>*How many iterations are required to reach convergence? *>*Could you please maybe run the solver with -ksp_view -log_view and send us the output? *>* For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt and memory usage log in RAM_log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). *>*Most of the default parameters of GAMG should be good enough for 3D elasticity, provided that your MatNullSpace is correct. *>* How can I be sure that nullspace is attached correctly? Is there any way for self-checking (Well perhaps calculate some parameters using matrix and solution vector)? *>*One parameter that may need some adjustments though is the aggregation threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] range, that?s what I always use for elasticity problems). *> * Tried to find optimal value of this option, set -pc_gamg_threshold 0.01 and -pc_gamg_threshold_scale 2, but I didn't notice any significant changes (Need more time for experiments ) Kind regards, Viktor Nazdrachev R&D senior researcher Geosteering Technologies LLC ??, 1 ????. 2021 ?. ? 12:01, Pierre Jolivet : > Dear Viktor, > > On 1 Sep 2021, at 10:42 AM, ????????? ?????? > wrote: > > Dear all, > > I have a 3D elasticity problem with heterogeneous properties. There is > unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet > BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are > imposed on side faces. Gravity load is also accounted for. The grid I use > consists of 500k cells (which is approximately 1.6M of DOFs). > > The best performance and memory usage for single MPI process was obtained > with HPDDM(BFBCG) solver > > Block Krylov solvers are (most often) only useful if you have multiple > right-hand sides, e.g., in the context of elasticity, multiple loadings. > Is that really the case? If not, you may as well stick to ?standard? CG > instead of the breakdown-free block (BFB) variant. > > and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s > and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s > when using 5.6 GB of RAM. This because of number of iterations required to > achieve the same tolerance is significantly increased. > > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) > sub-precondtioner. For single MPI process, the calculation took 10 min and > 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached > using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. > This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. > Also, there is peak memory usage with 14.1 GB, which appears just before > the start of the iterations. Parallel computation with 4 MPI processes took > 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is > about 22 GB. > > I?m surprised that GAMG is converging so slowly. What do you mean by > "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse > level solver? > How many iterations are required to reach convergence? > Could you please maybe run the solver with -ksp_view -log_view and send us > the output? > Most of the default parameters of GAMG should be good enough for 3D > elasticity, provided that your MatNullSpace is correct. > One parameter that may need some adjustments though is the aggregation > threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] > range, that?s what I always use for elasticity problems). > > Thanks, > Pierre > > Are there ways to avoid decreasing of the convergence rate for bjacobi > precondtioner in parallel mode? Does it make sense to use hierarchical or > nested krylov methods with a local gmres solver (sub_pc_type gmres) and > some sub-precondtioner (for example, sub_pc_type bjacobi)? > > > Is this peak memory usage expected for gamg preconditioner? is there any > way to reduce it? > > > What advice would you give to improve the convergence rate with multiple > MPI processes, but keep memory consumption reasonable? > > > Kind regards, > > Viktor Nazdrachev > > R&D senior researcher > > Geosteering Technologies LLC > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: logs.rar Type: application/octet-stream Size: 148643 bytes Desc: not available URL: From pierre at joliv.et Thu Sep 2 07:31:39 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Thu, 2 Sep 2021 14:31:39 +0200 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: <7EFBB20A-CB8A-47BA-BDD8-4E0BD43BBC31@joliv.et> Message-ID: <3789D448-CCED-403F-9984-340150F9761A@joliv.et> > On 2 Sep 2021, at 2:07 PM, Viktor Nazdrachev wrote: > > Hello, Pierre! > > Thank you for your response! > I attached log files (txt files with convergence behavior and RAM usage log in separate txt files) and resulting table with convergence investigation data(xls). Data for main non-regular grid with 500K cells and heterogeneous properties are in 500K folder, whereas data for simple uniform 125K cells grid with constant properties are in 125K folder. > > >Dear Viktor, > > > >> On 1 Sep 2021, at 10:42 AM, ????????? ?????? > > <>wrote: > >> > >> Dear all, > >> > >> I have a 3D elasticity problem with heterogeneous properties. There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). > >> > >> The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver > >> > >Block Krylov solvers are (most often) only useful if you have multiple right-hand sides, e.g., in the context of elasticity, multiple loadings. > Is that really the case? If not, you may as well stick to ?standard? CG instead of the breakdown-free block (BFB) variant. > > > > In that case only single right-hand side is utilized, so I switched to ?standard? cg solver (-ksp_hpddm_type cg), but I noticed the interesting convergence behavior. For non-regular grid with 500K cells and heterogeneous properties CG solver converged with 1 iteration (log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt), but for more simple uniform grid with 125K cells and homogeneous properties CG solves linear system successfully(log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt). > BFBCG solver works properly for both grids. Just stick to -ksp_type cg or maybe -ksp_type gmres -ksp_gmres_modifiedgramschmidt (even if the problem is SPD). Sorry if I repeat myself, but KSPHPDDM methods are mostly useful for either blocking or recycling. If you use something as simple as CG, you?ll get better diagnostics and error handling if you use the native PETSc implementation (KSPCG) instead of the external implementation (-ksp_hpddm_type cg). > > >> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. > >> > >> I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. > >> > >I?m surprised that GAMG is converging so slowly. What do you mean by "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse level solver? > > > > Sorry for misleading, ICC is used only for BJACOBI preconditioner, no ICC for GAMG. > > >How many iterations are required to reach convergence? > >Could you please maybe run the solver with -ksp_view -log_view and send us the output? > > > > For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt and memory usage log in RAM_log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). > > > >Most of the default parameters of GAMG should be good enough for 3D elasticity, provided that your MatNullSpace is correct. > > > > How can I be sure that nullspace is attached correctly? Is there any way for self-checking (Well perhaps calculate some parameters using matrix and solution vector)? > > >One parameter that may need some adjustments though is the aggregation threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] range, that?s what I always use for elasticity problems). > > > > Tried to find optimal value of this option, set -pc_gamg_threshold 0.01 and -pc_gamg_threshold_scale 2, but I didn't notice any significant changes (Need more time for experiments ) > I don?t see anything too crazy in your logs at first sight. In addition to maybe trying GMRES with a more robust orthogonalization scheme, here is what I would do: 1) MatSetBlockSize(Pmat, 6), it seems to be missing right now, cf. linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=1600200, cols=1600200 total: nonzeros=124439742, allocated nonzeros=259232400 total number of mallocs used during MatSetValues calls=0 has attached near null space 2) -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu 3) more playing around with the threshold, this can be critical for hard problems If you can share your matrix/nullspace/RHS, we could have a crack at it as well. Thanks, Pierre > Kind regards, > > Viktor Nazdrachev > > R&D senior researcher > > Geosteering Technologies LLC > > > ??, 1 ????. 2021 ?. ? 12:01, Pierre Jolivet >: > Dear Viktor, > >> On 1 Sep 2021, at 10:42 AM, ????????? ?????? > wrote: >> >> Dear all, >> >> I have a 3D elasticity problem with heterogeneous properties. There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). >> >> The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver >> > Block Krylov solvers are (most often) only useful if you have multiple right-hand sides, e.g., in the context of elasticity, multiple loadings. > Is that really the case? If not, you may as well stick to ?standard? CG instead of the breakdown-free block (BFB) variant. > >> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. >> >> I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. >> > I?m surprised that GAMG is converging so slowly. What do you mean by "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse level solver? > How many iterations are required to reach convergence? > Could you please maybe run the solver with -ksp_view -log_view and send us the output? > Most of the default parameters of GAMG should be good enough for 3D elasticity, provided that your MatNullSpace is correct. > One parameter that may need some adjustments though is the aggregation threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] range, that?s what I always use for elasticity problems). > > Thanks, > Pierre > >> Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)? >> >> >> Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it? >> >> >> What advice would you give to improve the convergence rate with multiple MPI processes, but keep memory consumption reasonable? >> >> >> Kind regards, >> >> Viktor Nazdrachev >> >> R&D senior researcher >> >> Geosteering Technologies LLC >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Thu Sep 2 07:34:35 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Thu, 2 Sep 2021 14:34:35 +0200 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: <3789D448-CCED-403F-9984-340150F9761A@joliv.et> References: <7EFBB20A-CB8A-47BA-BDD8-4E0BD43BBC31@joliv.et> <3789D448-CCED-403F-9984-340150F9761A@joliv.et> Message-ID: <70540BB9-BC66-40F3-9BD1-E7EC613AEE88@joliv.et> > On 2 Sep 2021, at 2:31 PM, Pierre Jolivet wrote: > > > >> On 2 Sep 2021, at 2:07 PM, Viktor Nazdrachev > wrote: >> >> Hello, Pierre! >> >> Thank you for your response! >> I attached log files (txt files with convergence behavior and RAM usage log in separate txt files) and resulting table with convergence investigation data(xls). Data for main non-regular grid with 500K cells and heterogeneous properties are in 500K folder, whereas data for simple uniform 125K cells grid with constant properties are in 125K folder. >> >> >Dear Viktor, >> > >> >> On 1 Sep 2021, at 10:42 AM, ????????? ?????? > > <>wrote: >> >> >> >> Dear all, >> >> >> >> I have a 3D elasticity problem with heterogeneous properties. There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). >> >> >> >> The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver >> >> >> >Block Krylov solvers are (most often) only useful if you have multiple right-hand sides, e.g., in the context of elasticity, multiple loadings. >> Is that really the case? If not, you may as well stick to ?standard? CG instead of the breakdown-free block (BFB) variant. >> > >> >> In that case only single right-hand side is utilized, so I switched to ?standard? cg solver (-ksp_hpddm_type cg), but I noticed the interesting convergence behavior. For non-regular grid with 500K cells and heterogeneous properties CG solver converged with 1 iteration (log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt), but for more simple uniform grid with 125K cells and homogeneous properties CG solves linear system successfully(log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt). >> BFBCG solver works properly for both grids. > > Just stick to -ksp_type cg or maybe -ksp_type gmres -ksp_gmres_modifiedgramschmidt (even if the problem is SPD). > Sorry if I repeat myself, but KSPHPDDM methods are mostly useful for either blocking or recycling. > If you use something as simple as CG, you?ll get better diagnostics and error handling if you use the native PETSc implementation (KSPCG) instead of the external implementation (-ksp_hpddm_type cg). > >> >> >> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. >> >> >> >> I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. >> >> >> >I?m surprised that GAMG is converging so slowly. What do you mean by "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse level solver? >> > >> >> Sorry for misleading, ICC is used only for BJACOBI preconditioner, no ICC for GAMG. >> >> >How many iterations are required to reach convergence? >> >Could you please maybe run the solver with -ksp_view -log_view and send us the output? >> > >> >> For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt and memory usage log in RAM_log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >> >> >> >Most of the default parameters of GAMG should be good enough for 3D elasticity, provided that your MatNullSpace is correct. >> > >> >> How can I be sure that nullspace is attached correctly? Is there any way for self-checking (Well perhaps calculate some parameters using matrix and solution vector)? >> >> >One parameter that may need some adjustments though is the aggregation threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] range, that?s what I always use for elasticity problems). >> > >> >> Tried to find optimal value of this option, set -pc_gamg_threshold 0.01 and -pc_gamg_threshold_scale 2, but I didn't notice any significant changes (Need more time for experiments ) >> > I don?t see anything too crazy in your logs at first sight. In addition to maybe trying GMRES with a more robust orthogonalization scheme, here is what I would do: > 1) MatSetBlockSize(Pmat, 6), it seems to be missing right now, cf. Sorry for the noise, but this should read 3, not 6? Thanks, Pierre > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=1600200, cols=1600200 > total: nonzeros=124439742, allocated nonzeros=259232400 > total number of mallocs used during MatSetValues calls=0 > has attached near null space > 2) -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu > 3) more playing around with the threshold, this can be critical for hard problems > If you can share your matrix/nullspace/RHS, we could have a crack at it as well. > > Thanks, > Pierre > >> Kind regards, >> >> Viktor Nazdrachev >> >> R&D senior researcher >> >> Geosteering Technologies LLC >> >> >> ??, 1 ????. 2021 ?. ? 12:01, Pierre Jolivet >: >> Dear Viktor, >> >>> On 1 Sep 2021, at 10:42 AM, ????????? ?????? > wrote: >>> >>> Dear all, >>> >>> I have a 3D elasticity problem with heterogeneous properties. There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). >>> >>> The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver >>> >> Block Krylov solvers are (most often) only useful if you have multiple right-hand sides, e.g., in the context of elasticity, multiple loadings. >> Is that really the case? If not, you may as well stick to ?standard? CG instead of the breakdown-free block (BFB) variant. >> >>> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. >>> >>> I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. >>> >> I?m surprised that GAMG is converging so slowly. What do you mean by "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse level solver? >> How many iterations are required to reach convergence? >> Could you please maybe run the solver with -ksp_view -log_view and send us the output? >> Most of the default parameters of GAMG should be good enough for 3D elasticity, provided that your MatNullSpace is correct. >> One parameter that may need some adjustments though is the aggregation threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] range, that?s what I always use for elasticity problems). >> >> Thanks, >> Pierre >> >>> Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)? >>> >>> >>> Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it? >>> >>> >>> What advice would you give to improve the convergence rate with multiple MPI processes, but keep memory consumption reasonable? >>> >>> >>> Kind regards, >>> >>> Viktor Nazdrachev >>> >>> R&D senior researcher >>> >>> Geosteering Technologies LLC >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 2 07:59:10 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 2 Sep 2021 08:59:10 -0400 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: <7EFBB20A-CB8A-47BA-BDD8-4E0BD43BBC31@joliv.et> Message-ID: On Thu, Sep 2, 2021 at 8:08 AM Viktor Nazdrachev wrote: > Hello, Pierre! > > Thank you for your response! > > I attached log files (txt files with convergence behavior and RAM usage > log in separate txt files) and resulting table with convergence > investigation data(xls). Data for main non-regular grid with 500K cells and > heterogeneous properties are in 500K folder, whereas data for simple > uniform 125K cells grid with constant properties are in 125K folder. > > > > >Dear Viktor, > > > > > >>* On 1 Sep 2021, at 10:42 AM, **?????????* *??????** <**numbersixvs at > gmail.com **> > >wrote:* > > *>*> > > *>*>* Dear all,* > > *>*> > > *>*>* I have a 3D elasticity problem with heterogeneous properties. There > is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet > BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are > imposed on side faces. Gravity load is also accounted for. The grid I use > consists of 500k cells (which is approximately 1.6M of DOFs).* > > *>*> > > *>*>* The best performance and memory usage for single MPI process was > obtained with HPDDM(BFBCG) solver* > > *>*> > > *>*Block Krylov solvers are (most often) only useful if you have multiple > right-hand sides, e.g., in the context of elasticity, multiple loadings. > > Is that really the case? If not, you may as well stick to ?standard? CG > instead of the breakdown-free block (BFB) variant. > > *> * > > > > In that case only single right-hand side is utilized, so I switched to > ?standard? cg solver (-ksp_hpddm_type cg), but I noticed the interesting > convergence behavior. For non-regular grid with 500K cells and > heterogeneous properties CG solver converged with 1 iteration > (log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt), but for more simple uniform > grid with 125K cells and homogeneous properties CG solves linear system > successfully(log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt). > > BFBCG solver works properly for both grids. > > > > > > *>*>* and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 > m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m > 46 s when using 5.6 GB of RAM. This because of number of iterations > required to achieve the same tolerance is significantly increased.* > > *>*> > > *>*>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) > sub-precondtioner. For single MPI process, the calculation took 10 min and > 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached > using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. > This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. > Also, there is peak memory usage with 14.1 GB, which appears just before > the start of the iterations. Parallel computation with 4 MPI processes took > 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is > about 22 GB.* > > >*> * > > *>*I?m surprised that GAMG is converging so slowly. What do you mean by > "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse > level solver? > > *>* > > > Sorry for misleading, ICC is used only for BJACOBI preconditioner, no ICC > for GAMG. > > > > *>*How many iterations are required to reach convergence? > > *>*Could you please maybe run the solver with -ksp_view -log_view and > send us the output? > > *>* > > > > For case with 4 MPI processes and attached nullspace it is required 177 > iterations > Pierre's suggestions are good ones. I am confused by the failure of GAMG, since 177 iterations is not good. Something is breaking down, either the smoother or the accuracy of the coarse grids. Can you give me an idea what your coefficient looks like? Thanks, Matt > to reach convergence (you may see detailed log in > log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt and memory usage log in > RAM_log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 > iterations are required for sequential > run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). > > > > *>*Most of the default parameters of GAMG should be good enough for 3D > elasticity, provided that your MatNullSpace is correct. > > *>* > > > > How can I be sure that nullspace is attached correctly? Is there any way > for self-checking (Well perhaps calculate some parameters using matrix and > solution vector)? > > > > *>*One parameter that may need some adjustments though is the aggregation > threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] > range, that?s what I always use for elasticity problems). > > *> * > > > > Tried to find optimal value of this option, set -pc_gamg_threshold 0.01 > and -pc_gamg_threshold_scale 2, but I didn't notice any significant > changes (Need more time for experiments ) > > > Kind regards, > > > > Viktor Nazdrachev > > > > R&D senior researcher > > > > Geosteering Technologies LLC > > ??, 1 ????. 2021 ?. ? 12:01, Pierre Jolivet : > >> Dear Viktor, >> >> On 1 Sep 2021, at 10:42 AM, ????????? ?????? >> wrote: >> >> Dear all, >> >> I have a 3D elasticity problem with heterogeneous properties. There is >> unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet >> BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are >> imposed on side faces. Gravity load is also accounted for. The grid I use >> consists of 500k cells (which is approximately 1.6M of DOFs). >> >> The best performance and memory usage for single MPI process was obtained >> with HPDDM(BFBCG) solver >> >> Block Krylov solvers are (most often) only useful if you have multiple >> right-hand sides, e.g., in the context of elasticity, multiple loadings. >> Is that really the case? If not, you may as well stick to ?standard? CG >> instead of the breakdown-free block (BFB) variant. >> >> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s >> and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s >> when using 5.6 GB of RAM. This because of number of iterations required to >> achieve the same tolerance is significantly increased. >> >> I`ve also tried PCGAMG (agg) preconditioner with IC? (1) >> sub-precondtioner. For single MPI process, the calculation took 10 min and >> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >> Also, there is peak memory usage with 14.1 GB, which appears just before >> the start of the iterations. Parallel computation with 4 MPI processes took >> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >> about 22 GB. >> >> I?m surprised that GAMG is converging so slowly. What do you mean by >> "ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse >> level solver? >> How many iterations are required to reach convergence? >> Could you please maybe run the solver with -ksp_view -log_view and send >> us the output? >> Most of the default parameters of GAMG should be good enough for 3D >> elasticity, provided that your MatNullSpace is correct. >> One parameter that may need some adjustments though is the aggregation >> threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] >> range, that?s what I always use for elasticity problems). >> >> Thanks, >> Pierre >> >> Are there ways to avoid decreasing of the convergence rate for bjacobi >> precondtioner in parallel mode? Does it make sense to use hierarchical or >> nested krylov methods with a local gmres solver (sub_pc_type gmres) and >> some sub-precondtioner (for example, sub_pc_type bjacobi)? >> >> >> Is this peak memory usage expected for gamg preconditioner? is there any >> way to reduce it? >> >> >> What advice would you give to improve the convergence rate with multiple >> MPI processes, but keep memory consumption reasonable? >> >> >> Kind regards, >> >> Viktor Nazdrachev >> >> R&D senior researcher >> >> Geosteering Technologies LLC >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From aduarteg at utexas.edu Thu Sep 2 09:24:46 2021 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Thu, 2 Sep 2021 09:24:46 -0500 Subject: [petsc-users] TSBDF info Message-ID: Good morning PETSC team, I am looking to implement a fully implicit BDF-DAE solver and I came across the TSBDF object https://petsc.org/release/src/ts/impls/bdf/bdf.c.html#TSBDF . This seems to be exactly what I am looking for, but I still have questions and I can't seem to find much from it in the documentation or the users manual. The key question for me is whether this is implementation is a *Variable Leading Coefficient BDF *or *Fixed-Leading Coefficient BDF* (such as CVODE). This impacts my ability of reusing matrix factorizations. Thank you, -Alfredo -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Thu Sep 2 11:32:34 2021 From: hongzhang at anl.gov (Zhang, Hong) Date: Thu, 2 Sep 2021 16:32:34 +0000 Subject: [petsc-users] TSBDF info In-Reply-To: References: Message-ID: <1788AED1-C7A3-45BD-BAB8-0EB2980F6FEB@anl.gov> On Sep 2, 2021, at 9:24 AM, Alfredo J Duarte Gomez > wrote: Good morning PETSC team, I am looking to implement a fully implicit BDF-DAE solver and I came across the TSBDF object https://petsc.org/release/src/ts/impls/bdf/bdf.c.html#TSBDF. This seems to be exactly what I am looking for, but I still have questions and I can't seem to find much from it in the documentation or the users manual. The key question for me is whether this is implementation is a Variable Leading Coefficient BDF or Fixed-Leading Coefficient BDF (such as CVODE). This impacts my ability of reusing matrix factorizations. It is not FLC BDF. It uses the classic variable-step formula that computes the coefficients from the last stepsizes. Hong (Mr.) Thank you, -Alfredo -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From numbersixvs at gmail.com Fri Sep 3 00:55:59 2021 From: numbersixvs at gmail.com (Viktor Nazdrachev) Date: Fri, 3 Sep 2021 08:55:59 +0300 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: Hello, Lawrence! Thank you for your response! I attached log files (txt files with convergence behavior and RAM usage log in separate txt files) and resulting table with convergence investigation data(xls). Data for main non-regular grid with 500K cells and heterogeneous properties are in 500K folder, whereas data for simple uniform 125K cells grid with constant properties are in 125K folder. >>* On 1 Sep 2021, at 09:42, **?????????** ??????** **> wrote:* >> >>* I have a 3D elasticity problem with heterogeneous properties.* > >What does your coefficient variation look like? How large is the contrast? Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to 0.44 and density ? from 1700 to 2600 kg/m^3. >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).* >> >>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.* > >How many iterations do you have in serial (and then in parallel)? Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI ? 680 iterations. I attached log files for all simulations (txt files with convergence behavior and RAM usage log in separate txt files) and resulting table with convergence/memory usage data(xls). Data for main non-regular grid with 500K cells and heterogeneous properties are in 500K folder, whereas data for simple uniform 125K cells grid with constant properties are in 125K folder. >>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.* > >Does the number of iterates increase in parallel? Again, how many iterations do you have? For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >>* Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?* > >bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit integers (It is extremely necessary to perform computation on mesh with more than 10M cells). >If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM. I found strange convergence behavior for HPDDM preconditioner. For 1 MPI process BFBCG solver did not converged (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation was successful (1018 to reach convergence, log_hpddm(bfbcg)_pchpddm_4_mpi.txt). But it should be mentioned that stiffness matrix was created in AIJ format (our default matrix format in program). Matrix conversion to MATIS format via MatConvert subroutine resulted in losing of convergence for both serial and parallel run. >>* Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?* > >I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. Thanks, I`ll try to use a strong threshold only for coarse grids. Kind regards, Viktor Nazdrachev R&D senior researcher Geosteering Technologies LLC ??, 1 ????. 2021 ?. ? 12:02, Lawrence Mitchell : > > > > On 1 Sep 2021, at 09:42, ????????? ?????? wrote: > > > > I have a 3D elasticity problem with heterogeneous properties. > > What does your coefficient variation look like? How large is the contrast? > > > There is unstructured grid with aspect ratio varied from 4 to 25. Zero > Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) > BCs are imposed on side faces. Gravity load is also accounted for. The grid > I use consists of 500k cells (which is approximately 1.6M of DOFs). > > > > The best performance and memory usage for single MPI process was > obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as > preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with > 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of > number of iterations required to achieve the same tolerance is > significantly increased. > > How many iterations do you have in serial (and then in parallel)? > > > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) > sub-precondtioner. For single MPI process, the calculation took 10 min and > 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached > using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. > This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. > Also, there is peak memory usage with 14.1 GB, which appears just before > the start of the iterations. Parallel computation with 4 MPI processes took > 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is > about 22 GB. > > Does the number of iterates increase in parallel? Again, how many > iterations do you have? > > > Are there ways to avoid decreasing of the convergence rate for bjacobi > precondtioner in parallel mode? Does it make sense to use hierarchical or > nested krylov methods with a local gmres solver (sub_pc_type gmres) and > some sub-precondtioner (for example, sub_pc_type bjacobi)? > > bjacobi is only a one-level method, so you would not expect > process-independent convergence rate for this kind of problem. If the > coefficient variation is not too extreme, then I would expect GAMG (or some > other smoothed aggregation package, perhaps -pc_type ml (you need > --download-ml)) would work well with some tuning. > > If you have extremely high contrast coefficients you might need something > with stronger coarse grids. If you can assemble so-called Neumann matrices ( > https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you > could try the geneo scheme offered by PCHPDDM. > > > Is this peak memory usage expected for gamg preconditioner? is there any > way to reduce it? > > I think that peak memory usage comes from building the coarse grids. Can > you run with `-info` and grep for GAMG, this will provide some output that > more expert GAMG users can interpret. > > Lawrence > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: logs.rar Type: application/octet-stream Size: 212693 bytes Desc: not available URL: From mfadams at lbl.gov Fri Sep 3 07:02:31 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 3 Sep 2021 08:02:31 -0400 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev wrote: > Hello, Lawrence! > Thank you for your response! > > I attached log files (txt files with convergence behavior and RAM usage > log in separate txt files) and resulting table with convergence > investigation data(xls). Data for main non-regular grid with 500K cells and > heterogeneous properties are in 500K folder, whereas data for simple > uniform 125K cells grid with constant properties are in 125K folder. > > > >>* On 1 Sep 2021, at 09:42, **?????????** ??????** **> wrote:* > > >> > > >>* I have a 3D elasticity problem with heterogeneous properties.* > > > > > >What does your coefficient variation look like? How large is the contrast? > > > > Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to > 0.44 and density ? from 1700 to 2600 kg/m^3. > That is not too bad. Poorly shaped elements are the next thing to worry about. Try to keep the aspect ratio below 10 if possible. > > > > > >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).* > > >> > > >>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.* > > > > > >How many iterations do you have in serial (and then in parallel)? > > > > Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI ? 680 iterations. > > > > I attached log files for all simulations (txt files with convergence > behavior and RAM usage log in separate txt files) and resulting table with > convergence/memory usage data(xls). Data for main non-regular grid with > 500K cells and heterogeneous properties are in 500K folder, whereas data > for simple uniform 125K cells grid with constant properties are in 125K > folder. > > > > > > >>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.* > > > > > >Does the number of iterates increase in parallel? Again, how many iterations do you have? > > > > For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). > > Again, do not use ICC. I am surprised to see such a large jump in iteration count, but get ICC off the table. You will see variability in the iteration count with processor count with GAMG. As much as 10% +-. Maybe more (random) variability , but usually less. You can decrease the memory a little, and the setup time a lot, by aggressively coarsening, at the expense of higher iteration counts. It's a balancing act. You can run with the defaults, add '-info', grep on GAMG and send the ~30 lines of output if you want advice on parameters. Thanks, Mark > > > > > > > >>* Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?* > > > > > >bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. > > > > Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit > integers (It is extremely necessary to perform computation on mesh with > more than 10M cells). > > > > > > >If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM. > > > > > > I found strange convergence behavior for HPDDM preconditioner. For 1 MPI > process BFBCG solver did not converged > (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation > was successful (1018 to reach convergence, > log_hpddm(bfbcg)_pchpddm_4_mpi.txt). > > But it should be mentioned that stiffness matrix was created in AIJ format > (our default matrix format in program). > > Matrix conversion to MATIS format via MatConvert subroutine resulted in > losing of convergence for both serial and parallel run. > > > >>* Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?* > > > > > >I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. > > > > Thanks, I`ll try to use a strong threshold only for coarse grids. > > > > Kind regards, > > > > Viktor Nazdrachev > > > > R&D senior researcher > > > > Geosteering Technologies LLC > > > > > > > > > > ??, 1 ????. 2021 ?. ? 12:02, Lawrence Mitchell : > >> >> >> > On 1 Sep 2021, at 09:42, ????????? ?????? >> wrote: >> > >> > I have a 3D elasticity problem with heterogeneous properties. >> >> What does your coefficient variation look like? How large is the contrast? >> >> > There is unstructured grid with aspect ratio varied from 4 to 25. Zero >> Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) >> BCs are imposed on side faces. Gravity load is also accounted for. The grid >> I use consists of 500k cells (which is approximately 1.6M of DOFs). >> > >> > The best performance and memory usage for single MPI process was >> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as >> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with >> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of >> number of iterations required to achieve the same tolerance is >> significantly increased. >> >> How many iterations do you have in serial (and then in parallel)? >> >> > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) >> sub-precondtioner. For single MPI process, the calculation took 10 min and >> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >> Also, there is peak memory usage with 14.1 GB, which appears just before >> the start of the iterations. Parallel computation with 4 MPI processes took >> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >> about 22 GB. >> >> Does the number of iterates increase in parallel? Again, how many >> iterations do you have? >> >> > Are there ways to avoid decreasing of the convergence rate for bjacobi >> precondtioner in parallel mode? Does it make sense to use hierarchical or >> nested krylov methods with a local gmres solver (sub_pc_type gmres) and >> some sub-precondtioner (for example, sub_pc_type bjacobi)? >> >> bjacobi is only a one-level method, so you would not expect >> process-independent convergence rate for this kind of problem. If the >> coefficient variation is not too extreme, then I would expect GAMG (or some >> other smoothed aggregation package, perhaps -pc_type ml (you need >> --download-ml)) would work well with some tuning. >> >> If you have extremely high contrast coefficients you might need something >> with stronger coarse grids. If you can assemble so-called Neumann matrices ( >> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then >> you could try the geneo scheme offered by PCHPDDM. >> >> > Is this peak memory usage expected for gamg preconditioner? is there >> any way to reduce it? >> >> I think that peak memory usage comes from building the coarse grids. Can >> you run with `-info` and grep for GAMG, this will provide some output that >> more expert GAMG users can interpret. >> >> Lawrence >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 3 07:11:35 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Sep 2021 08:11:35 -0400 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: On Fri, Sep 3, 2021 at 8:02 AM Mark Adams wrote: > > > On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev > wrote: > >> Hello, Lawrence! >> Thank you for your response! >> >> I attached log files (txt files with convergence behavior and RAM usage >> log in separate txt files) and resulting table with convergence >> investigation data(xls). Data for main non-regular grid with 500K cells and >> heterogeneous properties are in 500K folder, whereas data for simple >> uniform 125K cells grid with constant properties are in 125K folder. >> >> >> >>* On 1 Sep 2021, at 09:42, **?????????** ??????** **> wrote:* >> >> >> >> >> >>* I have a 3D elasticity problem with heterogeneous properties.* >> >> > >> >> >What does your coefficient variation look like? How large is the contrast? >> >> >> >> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to >> 0.44 and density ? from 1700 to 2600 kg/m^3. >> > > That is not too bad. Poorly shaped elements are the next thing to worry > about. Try to keep the aspect ratio below 10 if possible. > > >> >> >> >> >> >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).* >> >> >> >> >> >>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.* >> >> > >> >> >How many iterations do you have in serial (and then in parallel)? >> >> >> >> Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI ? 680 iterations. >> >> >> >> I attached log files for all simulations (txt files with convergence >> behavior and RAM usage log in separate txt files) and resulting table with >> convergence/memory usage data(xls). Data for main non-regular grid with >> 500K cells and heterogeneous properties are in 500K folder, whereas data >> for simple uniform 125K cells grid with constant properties are in 125K >> folder. >> >> >> >> >> >> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.* >> >> > >> >> >Does the number of iterates increase in parallel? Again, how many iterations do you have? >> >> >> >> For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >> >> > Again, do not use ICC. I am surprised to see such a large jump in > iteration count, but get ICC off the table. > > You will see variability in the iteration count with processor count with > GAMG. As much as 10% +-. Maybe more (random) variability , but usually less. > > You can decrease the memory a little, and the setup time a lot, by > aggressively coarsening, at the expense of higher iteration counts. It's a > balancing act. > > You can run with the defaults, add '-info', grep on GAMG and send the ~30 > lines of output if you want advice on parameters. > Can you send the output of -ksp_view -ksp_monitor_true_residual -ksp_converged_reason Thanks, Matt > Thanks, > Mark > > >> >> >> >> >> >> >> >>* Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?* >> >> > >> >> >bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. >> >> >> >> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit >> integers (It is extremely necessary to perform computation on mesh with >> more than 10M cells). >> >> >> >> >> >> >If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM. >> >> >> >> >> >> I found strange convergence behavior for HPDDM preconditioner. For 1 MPI >> process BFBCG solver did not converged >> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation >> was successful (1018 to reach convergence, >> log_hpddm(bfbcg)_pchpddm_4_mpi.txt). >> >> But it should be mentioned that stiffness matrix was created in AIJ >> format (our default matrix format in program). >> >> Matrix conversion to MATIS format via MatConvert subroutine resulted in >> losing of convergence for both serial and parallel run. >> >> >> >>* Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?* >> >> > >> >> >I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. >> >> >> >> Thanks, I`ll try to use a strong threshold only for coarse grids. >> >> >> >> Kind regards, >> >> >> >> Viktor Nazdrachev >> >> >> >> R&D senior researcher >> >> >> >> Geosteering Technologies LLC >> >> >> >> >> >> >> >> >> >> ??, 1 ????. 2021 ?. ? 12:02, Lawrence Mitchell : >> >>> >>> >>> > On 1 Sep 2021, at 09:42, ????????? ?????? >>> wrote: >>> > >>> > I have a 3D elasticity problem with heterogeneous properties. >>> >>> What does your coefficient variation look like? How large is the >>> contrast? >>> >>> > There is unstructured grid with aspect ratio varied from 4 to 25. Zero >>> Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) >>> BCs are imposed on side faces. Gravity load is also accounted for. The grid >>> I use consists of 500k cells (which is approximately 1.6M of DOFs). >>> > >>> > The best performance and memory usage for single MPI process was >>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as >>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with >>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of >>> number of iterations required to achieve the same tolerance is >>> significantly increased. >>> >>> How many iterations do you have in serial (and then in parallel)? >>> >>> > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) >>> sub-precondtioner. For single MPI process, the calculation took 10 min and >>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >>> Also, there is peak memory usage with 14.1 GB, which appears just before >>> the start of the iterations. Parallel computation with 4 MPI processes took >>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >>> about 22 GB. >>> >>> Does the number of iterates increase in parallel? Again, how many >>> iterations do you have? >>> >>> > Are there ways to avoid decreasing of the convergence rate for bjacobi >>> precondtioner in parallel mode? Does it make sense to use hierarchical or >>> nested krylov methods with a local gmres solver (sub_pc_type gmres) and >>> some sub-precondtioner (for example, sub_pc_type bjacobi)? >>> >>> bjacobi is only a one-level method, so you would not expect >>> process-independent convergence rate for this kind of problem. If the >>> coefficient variation is not too extreme, then I would expect GAMG (or some >>> other smoothed aggregation package, perhaps -pc_type ml (you need >>> --download-ml)) would work well with some tuning. >>> >>> If you have extremely high contrast coefficients you might need >>> something with stronger coarse grids. If you can assemble so-called Neumann >>> matrices ( >>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then >>> you could try the geneo scheme offered by PCHPDDM. >>> >>> > Is this peak memory usage expected for gamg preconditioner? is there >>> any way to reduce it? >>> >>> I think that peak memory usage comes from building the coarse grids. Can >>> you run with `-info` and grep for GAMG, this will provide some output that >>> more expert GAMG users can interpret. >>> >>> Lawrence >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From numbersixvs at gmail.com Fri Sep 3 09:48:08 2021 From: numbersixvs at gmail.com (Viktor Nazdrachev) Date: Fri, 3 Sep 2021 17:48:08 +0300 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: Hello Mark and Matthew! I attached log files for serial and parallel cases and corresponding information about GAMG preconditioner (using grep). I have to notice, that assembling of global stiffness matrix in code was performed by MatSetValues subrotuine (not MatSetValuesBlocked) !nnds ? number of nodes !dmn=3 call MatCreate(Petsc_Comm_World,Mat_K,ierr) call MatSetFromOptions(Mat_K,ierr) call MatSetSizes(Mat_K,Petsc_Decide,Petsc_Decide,n,n,ierr_m) ? call MatMPIAIJSetPreallocation(Mat_K,0,dbw,0,obw,ierr) ? call MatSetOption(Mat_K,Mat_New_Nonzero_Allocation_Err,Petsc_False,ierr) ? do i=1,nels call FormLocalK(i,k,indx,"Kp") ! find local stiffness matrix indx=indxmap(indx,2) !find global indices for DOFs call MatSetValues(Mat_K,ef_eldof,indx,ef_eldof,indx,k,Add_Values,ierr) end do But nullspace vector was created using VecSetBlockSize subroutine. call VecCreate(Petsc_Comm_World,Vec_NullSpace,ierr) call VecSetBlockSize(Vec_NullSpace,dmn,ierr) call VecSetSizes(Vec_NullSpace,nnds*dmn,Petsc_Decide,ierr) call VecSetUp(Vec_NullSpace,ierr) call VecGetArrayF90(Vec_NullSpace,null_space,ierr) ? call VecRestoreArrayF90(Vec_NullSpace,null_space,ierr) call MatNullSpaceCreateRigidBody(Vec_NullSpace,matnull,ierr) call MatSetNearNullSpace(Mat_K,matnull,ierr) I suppose it can be one of the reasons of GAMG slow convergence. So I attached log files for parallel run with ?pure? GAMG precondtioner. Kind regards, Viktor Nazdrachev R&D senior researcher Geosteering Technologies LLC ??, 3 ????. 2021 ?. ? 15:11, Matthew Knepley : > On Fri, Sep 3, 2021 at 8:02 AM Mark Adams wrote: > >> >> >> On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev >> wrote: >> >>> Hello, Lawrence! >>> Thank you for your response! >>> >>> I attached log files (txt files with convergence behavior and RAM usage >>> log in separate txt files) and resulting table with convergence >>> investigation data(xls). Data for main non-regular grid with 500K cells and >>> heterogeneous properties are in 500K folder, whereas data for simple >>> uniform 125K cells grid with constant properties are in 125K folder. >>> >>> >>> >>* On 1 Sep 2021, at 09:42, **?????????** ??????** **> wrote:* >>> >>> >> >>> >>> >>* I have a 3D elasticity problem with heterogeneous properties.* >>> >>> > >>> >>> >What does your coefficient variation look like? How large is the contrast? >>> >>> >>> >>> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to >>> 0.44 and density ? from 1700 to 2600 kg/m^3. >>> >> >> That is not too bad. Poorly shaped elements are the next thing to worry >> about. Try to keep the aspect ratio below 10 if possible. >> >> >>> >>> >>> >>> >>> >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).* >>> >>> >> >>> >>> >>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.* >>> >>> > >>> >>> >How many iterations do you have in serial (and then in parallel)? >>> >>> >>> >>> Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI ? 680 iterations. >>> >>> >>> >>> I attached log files for all simulations (txt files with convergence >>> behavior and RAM usage log in separate txt files) and resulting table with >>> convergence/memory usage data(xls). Data for main non-regular grid with >>> 500K cells and heterogeneous properties are in 500K folder, whereas data >>> for simple uniform 125K cells grid with constant properties are in 125K >>> folder. >>> >>> >>> >>> >>> >>> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.* >>> >>> > >>> >>> >Does the number of iterates increase in parallel? Again, how many iterations do you have? >>> >>> >>> >>> For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >>> >>> >> Again, do not use ICC. I am surprised to see such a large jump in >> iteration count, but get ICC off the table. >> >> You will see variability in the iteration count with processor count with >> GAMG. As much as 10% +-. Maybe more (random) variability , but usually less. >> >> You can decrease the memory a little, and the setup time a lot, by >> aggressively coarsening, at the expense of higher iteration counts. It's a >> balancing act. >> >> You can run with the defaults, add '-info', grep on GAMG and send the ~30 >> lines of output if you want advice on parameters. >> > > Can you send the output of > > -ksp_view -ksp_monitor_true_residual -ksp_converged_reason > > Thanks, > > Matt > > >> Thanks, >> Mark >> >> >>> >>> >>> >>> >>> >>> >>> >>* Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?* >>> >>> > >>> >>> >bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. >>> >>> >>> >>> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit >>> integers (It is extremely necessary to perform computation on mesh with >>> more than 10M cells). >>> >>> >>> >>> >>> >>> >If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM. >>> >>> >>> >>> >>> >>> I found strange convergence behavior for HPDDM preconditioner. For 1 MPI >>> process BFBCG solver did not converged >>> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation >>> was successful (1018 to reach convergence, >>> log_hpddm(bfbcg)_pchpddm_4_mpi.txt). >>> >>> But it should be mentioned that stiffness matrix was created in AIJ >>> format (our default matrix format in program). >>> >>> Matrix conversion to MATIS format via MatConvert subroutine resulted in >>> losing of convergence for both serial and parallel run. >>> >>> >>> >>* Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?* >>> >>> > >>> >>> >I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. >>> >>> >>> >>> Thanks, I`ll try to use a strong threshold only for coarse grids. >>> >>> >>> >>> Kind regards, >>> >>> >>> >>> Viktor Nazdrachev >>> >>> >>> >>> R&D senior researcher >>> >>> >>> >>> Geosteering Technologies LLC >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ??, 1 ????. 2021 ?. ? 12:02, Lawrence Mitchell : >>> >>>> >>>> >>>> > On 1 Sep 2021, at 09:42, ????????? ?????? >>>> wrote: >>>> > >>>> > I have a 3D elasticity problem with heterogeneous properties. >>>> >>>> What does your coefficient variation look like? How large is the >>>> contrast? >>>> >>>> > There is unstructured grid with aspect ratio varied from 4 to 25. >>>> Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann >>>> (traction) BCs are imposed on side faces. Gravity load is also accounted >>>> for. The grid I use consists of 500k cells (which is approximately 1.6M of >>>> DOFs). >>>> > >>>> > The best performance and memory usage for single MPI process was >>>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as >>>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with >>>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of >>>> number of iterations required to achieve the same tolerance is >>>> significantly increased. >>>> >>>> How many iterations do you have in serial (and then in parallel)? >>>> >>>> > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) >>>> sub-precondtioner. For single MPI process, the calculation took 10 min and >>>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >>>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >>>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >>>> Also, there is peak memory usage with 14.1 GB, which appears just before >>>> the start of the iterations. Parallel computation with 4 MPI processes took >>>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >>>> about 22 GB. >>>> >>>> Does the number of iterates increase in parallel? Again, how many >>>> iterations do you have? >>>> >>>> > Are there ways to avoid decreasing of the convergence rate for >>>> bjacobi precondtioner in parallel mode? Does it make sense to use >>>> hierarchical or nested krylov methods with a local gmres solver >>>> (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type >>>> bjacobi)? >>>> >>>> bjacobi is only a one-level method, so you would not expect >>>> process-independent convergence rate for this kind of problem. If the >>>> coefficient variation is not too extreme, then I would expect GAMG (or some >>>> other smoothed aggregation package, perhaps -pc_type ml (you need >>>> --download-ml)) would work well with some tuning. >>>> >>>> If you have extremely high contrast coefficients you might need >>>> something with stronger coarse grids. If you can assemble so-called Neumann >>>> matrices ( >>>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then >>>> you could try the geneo scheme offered by PCHPDDM. >>>> >>>> > Is this peak memory usage expected for gamg preconditioner? is there >>>> any way to reduce it? >>>> >>>> I think that peak memory usage comes from building the coarse grids. >>>> Can you run with `-info` and grep for GAMG, this will provide some output >>>> that more expert GAMG users can interpret. >>>> >>>> Lawrence >>>> >>>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: true_residual_logs_and_greped.rar Type: application/octet-stream Size: 55998 bytes Desc: not available URL: From mfadams at lbl.gov Fri Sep 3 09:56:06 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 3 Sep 2021 10:56:06 -0400 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: That does not seem to be an ASCII file. On Fri, Sep 3, 2021 at 10:48 AM Viktor Nazdrachev wrote: > Hello Mark and Matthew! > > > > I attached log files for serial and parallel cases and corresponding information about GAMG preconditioner (using grep). > > I have to notice, that assembling of global stiffness matrix in code was performed by MatSetValues subrotuine (not MatSetValuesBlocked) > > !nnds ? number of nodes > > !dmn=3 > > call MatCreate(Petsc_Comm_World,Mat_K,ierr) > > call MatSetFromOptions(Mat_K,ierr) > > call MatSetSizes(Mat_K,Petsc_Decide,Petsc_Decide,n,n,ierr_m) > > ? > > call MatMPIAIJSetPreallocation(Mat_K,0,dbw,0,obw,ierr) > > ? > > call MatSetOption(Mat_K,Mat_New_Nonzero_Allocation_Err,Petsc_False,ierr) > > ? > > do i=1,nels > > call FormLocalK(i,k,indx,"Kp") ! find local stiffness matrix > > indx=indxmap(indx,2) !find global indices for DOFs > > call MatSetValues(Mat_K,ef_eldof,indx,ef_eldof,indx,k,Add_Values,ierr) > > end do > > > > But nullspace vector was created using VecSetBlockSize subroutine. > > > > call VecCreate(Petsc_Comm_World,Vec_NullSpace,ierr) > > call VecSetBlockSize(Vec_NullSpace,dmn,ierr) > > call VecSetSizes(Vec_NullSpace,nnds*dmn,Petsc_Decide,ierr) > > call VecSetUp(Vec_NullSpace,ierr) > > call VecGetArrayF90(Vec_NullSpace,null_space,ierr) > > ? > > call VecRestoreArrayF90(Vec_NullSpace,null_space,ierr) > > call MatNullSpaceCreateRigidBody(Vec_NullSpace,matnull,ierr) > > call MatSetNearNullSpace(Mat_K,matnull,ierr) > > > > I suppose it can be one of the reasons of GAMG slow convergence. > > So I attached log files for parallel run with ?pure? GAMG precondtioner. > > > > > > Kind regards, > > > > Viktor Nazdrachev > > > > R&D senior researcher > > > > Geosteering Technologies LLC > > ??, 3 ????. 2021 ?. ? 15:11, Matthew Knepley : > >> On Fri, Sep 3, 2021 at 8:02 AM Mark Adams wrote: >> >>> >>> >>> On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev >>> wrote: >>> >>>> Hello, Lawrence! >>>> Thank you for your response! >>>> >>>> I attached log files (txt files with convergence behavior and RAM usage >>>> log in separate txt files) and resulting table with convergence >>>> investigation data(xls). Data for main non-regular grid with 500K cells and >>>> heterogeneous properties are in 500K folder, whereas data for simple >>>> uniform 125K cells grid with constant properties are in 125K folder. >>>> >>>> >>>> >>* On 1 Sep 2021, at 09:42, **?????????** ??????** **> wrote:* >>>> >>>> >> >>>> >>>> >>* I have a 3D elasticity problem with heterogeneous properties.* >>>> >>>> > >>>> >>>> >What does your coefficient variation look like? How large is the contrast? >>>> >>>> >>>> >>>> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to >>>> 0.44 and density ? from 1700 to 2600 kg/m^3. >>>> >>> >>> That is not too bad. Poorly shaped elements are the next thing to worry >>> about. Try to keep the aspect ratio below 10 if possible. >>> >>> >>>> >>>> >>>> >>>> >>>> >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).* >>>> >>>> >> >>>> >>>> >>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.* >>>> >>>> > >>>> >>>> >How many iterations do you have in serial (and then in parallel)? >>>> >>>> >>>> >>>> Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI ? 680 iterations. >>>> >>>> >>>> >>>> I attached log files for all simulations (txt files with convergence >>>> behavior and RAM usage log in separate txt files) and resulting table with >>>> convergence/memory usage data(xls). Data for main non-regular grid with >>>> 500K cells and heterogeneous properties are in 500K folder, whereas data >>>> for simple uniform 125K cells grid with constant properties are in 125K >>>> folder. >>>> >>>> >>>> >>>> >>>> >>>> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.* >>>> >>>> > >>>> >>>> >Does the number of iterates increase in parallel? Again, how many iterations do you have? >>>> >>>> >>>> >>>> For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >>>> >>>> >>> Again, do not use ICC. I am surprised to see such a large jump in >>> iteration count, but get ICC off the table. >>> >>> You will see variability in the iteration count with processor count >>> with GAMG. As much as 10% +-. Maybe more (random) variability , but usually >>> less. >>> >>> You can decrease the memory a little, and the setup time a lot, by >>> aggressively coarsening, at the expense of higher iteration counts. It's a >>> balancing act. >>> >>> You can run with the defaults, add '-info', grep on GAMG and send the >>> ~30 lines of output if you want advice on parameters. >>> >> >> Can you send the output of >> >> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Mark >>> >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>* Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?* >>>> >>>> > >>>> >>>> >bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. >>>> >>>> >>>> >>>> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit >>>> integers (It is extremely necessary to perform computation on mesh with >>>> more than 10M cells). >>>> >>>> >>>> >>>> >>>> >>>> >If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM. >>>> >>>> >>>> >>>> >>>> >>>> I found strange convergence behavior for HPDDM preconditioner. For 1 >>>> MPI process BFBCG solver did not converged >>>> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation >>>> was successful (1018 to reach convergence, >>>> log_hpddm(bfbcg)_pchpddm_4_mpi.txt). >>>> >>>> But it should be mentioned that stiffness matrix was created in AIJ >>>> format (our default matrix format in program). >>>> >>>> Matrix conversion to MATIS format via MatConvert subroutine resulted >>>> in losing of convergence for both serial and parallel run. >>>> >>>> >>>> >>* Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?* >>>> >>>> > >>>> >>>> >I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. >>>> >>>> >>>> >>>> Thanks, I`ll try to use a strong threshold only for coarse grids. >>>> >>>> >>>> >>>> Kind regards, >>>> >>>> >>>> >>>> Viktor Nazdrachev >>>> >>>> >>>> >>>> R&D senior researcher >>>> >>>> >>>> >>>> Geosteering Technologies LLC >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ??, 1 ????. 2021 ?. ? 12:02, Lawrence Mitchell : >>>> >>>>> >>>>> >>>>> > On 1 Sep 2021, at 09:42, ????????? ?????? >>>>> wrote: >>>>> > >>>>> > I have a 3D elasticity problem with heterogeneous properties. >>>>> >>>>> What does your coefficient variation look like? How large is the >>>>> contrast? >>>>> >>>>> > There is unstructured grid with aspect ratio varied from 4 to 25. >>>>> Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann >>>>> (traction) BCs are imposed on side faces. Gravity load is also accounted >>>>> for. The grid I use consists of 500k cells (which is approximately 1.6M of >>>>> DOFs). >>>>> > >>>>> > The best performance and memory usage for single MPI process was >>>>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as >>>>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with >>>>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of >>>>> number of iterations required to achieve the same tolerance is >>>>> significantly increased. >>>>> >>>>> How many iterations do you have in serial (and then in parallel)? >>>>> >>>>> > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) >>>>> sub-precondtioner. For single MPI process, the calculation took 10 min and >>>>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >>>>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >>>>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >>>>> Also, there is peak memory usage with 14.1 GB, which appears just before >>>>> the start of the iterations. Parallel computation with 4 MPI processes took >>>>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >>>>> about 22 GB. >>>>> >>>>> Does the number of iterates increase in parallel? Again, how many >>>>> iterations do you have? >>>>> >>>>> > Are there ways to avoid decreasing of the convergence rate for >>>>> bjacobi precondtioner in parallel mode? Does it make sense to use >>>>> hierarchical or nested krylov methods with a local gmres solver >>>>> (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type >>>>> bjacobi)? >>>>> >>>>> bjacobi is only a one-level method, so you would not expect >>>>> process-independent convergence rate for this kind of problem. If the >>>>> coefficient variation is not too extreme, then I would expect GAMG (or some >>>>> other smoothed aggregation package, perhaps -pc_type ml (you need >>>>> --download-ml)) would work well with some tuning. >>>>> >>>>> If you have extremely high contrast coefficients you might need >>>>> something with stronger coarse grids. If you can assemble so-called Neumann >>>>> matrices ( >>>>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then >>>>> you could try the geneo scheme offered by PCHPDDM. >>>>> >>>>> > Is this peak memory usage expected for gamg preconditioner? is there >>>>> any way to reduce it? >>>>> >>>>> I think that peak memory usage comes from building the coarse grids. >>>>> Can you run with `-info` and grep for GAMG, this will provide some output >>>>> that more expert GAMG users can interpret. >>>>> >>>>> Lawrence >>>>> >>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 3 10:07:05 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Sep 2021 11:07:05 -0400 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: It is a RAR since this is Windows :) Viktor, your system looks singular. Is it possible that you somehow have zero on the diagonal? That might make the SOR a problem. You could replace that with Jacobi using -mg_levels_pc_type jacobi 0 KSP Residual norm 2.980664994991e+02 0 KSP preconditioned resid norm 2.980664994991e+02 true resid norm 7.983356882620e+11 ||r(i)||/||b|| 1.000000000000e+00 1 KSP Residual norm 1.650358505966e+01 1 KSP preconditioned resid norm 1.650358505966e+01 true resid norm 4.601793132543e+12 ||r(i)||/||b|| 5.764233267037e+00 2 KSP Residual norm 2.086911345353e+01 2 KSP preconditioned resid norm 2.086911345353e+01 true resid norm 1.258153657657e+12 ||r(i)||/||b|| 1.575970705250e+00 3 KSP Residual norm 1.909137523120e+01 3 KSP preconditioned resid norm 1.909137523120e+01 true resid norm 2.179275269000e+12 ||r(i)||/||b|| 2.729773077969e+00 Mark, here is the solver KSP Object: 1 MPI processes type: cg maximum iterations=100000, initial guess is zero tolerances: relative=1e-08, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: gamg type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using externally compute Galerkin coarse grid matrices GAMG specific options Threshold for dropping small values in graph on each level = 0. 0. 0. 0. Threshold scaling factor for each level not specified = 1. AGG specific options Symmetric graph false Number of levels to square graph 1 Number smoothing steps 1 Complexity: grid = 1.0042 Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: bjacobi number of blocks = 1 Local solver information for first block is in the following KSP and PC objects on rank 0: Use -mg_coarse_ksp_view ::ascii_info_detail to display information for all blocks KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1.19444 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=36, cols=36 package used to perform factorization: petsc total: nonzeros=774, allocated nonzeros=774 using I-node routines: found 22 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (mg_coarse_sub_) 1 MPI processes type: seqaij rows=36, cols=36 total: nonzeros=648, allocated nonzeros=648 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: (mg_coarse_sub_) 1 MPI processes type: seqaij rows=36, cols=36 total: nonzeros=648, allocated nonzeros=648 total number of mallocs used during MatSetValues calls=0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI processes type: chebyshev eigenvalue estimates used: min = 0.0997354, max = 1.09709 eigenvalues estimate via gmres min 0.00372245, max 0.997354 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=902, cols=902 total: nonzeros=66660, allocated nonzeros=66660 total number of mallocs used during MatSetValues calls=0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 1 MPI processes type: chebyshev eigenvalue estimates used: min = 0.0994525, max = 1.09398 eigenvalues estimate via gmres min 0.0303095, max 0.994525 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_2_esteig_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_2_) 1 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=12043, cols=12043 total: nonzeros=455611, allocated nonzeros=455611 total number of mallocs used during MatSetValues calls=0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 1 MPI processes type: chebyshev eigenvalue estimates used: min = 0.0992144, max = 1.09136 eigenvalues estimate via gmres min 0.0222691, max 0.992144 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_3_esteig_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_3_) 1 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1600200, cols=1600200 total: nonzeros=124439742, allocated nonzeros=129616200 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 533400 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=1600200, cols=1600200 total: nonzeros=124439742, allocated nonzeros=129616200 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 533400 nodes, limit used is 5 Thanks, Matt On Fri, Sep 3, 2021 at 10:56 AM Mark Adams wrote: > That does not seem to be an ASCII file. > > On Fri, Sep 3, 2021 at 10:48 AM Viktor Nazdrachev > wrote: > >> Hello Mark and Matthew! >> >> >> >> I attached log files for serial and parallel cases and corresponding information about GAMG preconditioner (using grep). >> >> I have to notice, that assembling of global stiffness matrix in code was performed by MatSetValues subrotuine (not MatSetValuesBlocked) >> >> !nnds ? number of nodes >> >> !dmn=3 >> >> call MatCreate(Petsc_Comm_World,Mat_K,ierr) >> >> call MatSetFromOptions(Mat_K,ierr) >> >> call MatSetSizes(Mat_K,Petsc_Decide,Petsc_Decide,n,n,ierr_m) >> >> ? >> >> call MatMPIAIJSetPreallocation(Mat_K,0,dbw,0,obw,ierr) >> >> ? >> >> call MatSetOption(Mat_K,Mat_New_Nonzero_Allocation_Err,Petsc_False,ierr) >> >> ? >> >> do i=1,nels >> >> call FormLocalK(i,k,indx,"Kp") ! find local stiffness matrix >> >> indx=indxmap(indx,2) !find global indices for DOFs >> >> call MatSetValues(Mat_K,ef_eldof,indx,ef_eldof,indx,k,Add_Values,ierr) >> >> end do >> >> >> >> But nullspace vector was created using VecSetBlockSize subroutine. >> >> >> >> call VecCreate(Petsc_Comm_World,Vec_NullSpace,ierr) >> >> call VecSetBlockSize(Vec_NullSpace,dmn,ierr) >> >> call VecSetSizes(Vec_NullSpace,nnds*dmn,Petsc_Decide,ierr) >> >> call VecSetUp(Vec_NullSpace,ierr) >> >> call VecGetArrayF90(Vec_NullSpace,null_space,ierr) >> >> ? >> >> call VecRestoreArrayF90(Vec_NullSpace,null_space,ierr) >> >> call MatNullSpaceCreateRigidBody(Vec_NullSpace,matnull,ierr) >> >> call MatSetNearNullSpace(Mat_K,matnull,ierr) >> >> >> >> I suppose it can be one of the reasons of GAMG slow convergence. >> >> So I attached log files for parallel run with ?pure? GAMG precondtioner. >> >> >> >> >> >> Kind regards, >> >> >> >> Viktor Nazdrachev >> >> >> >> R&D senior researcher >> >> >> >> Geosteering Technologies LLC >> >> ??, 3 ????. 2021 ?. ? 15:11, Matthew Knepley : >> >>> On Fri, Sep 3, 2021 at 8:02 AM Mark Adams wrote: >>> >>>> >>>> >>>> On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev >>>> wrote: >>>> >>>>> Hello, Lawrence! >>>>> Thank you for your response! >>>>> >>>>> I attached log files (txt files with convergence behavior and RAM >>>>> usage log in separate txt files) and resulting table with convergence >>>>> investigation data(xls). Data for main non-regular grid with 500K cells and >>>>> heterogeneous properties are in 500K folder, whereas data for simple >>>>> uniform 125K cells grid with constant properties are in 125K folder. >>>>> >>>>> >>>>> >>* On 1 Sep 2021, at 09:42, **?????????** ??????** **> wrote:* >>>>> >>>>> >> >>>>> >>>>> >>* I have a 3D elasticity problem with heterogeneous properties.* >>>>> >>>>> > >>>>> >>>>> >What does your coefficient variation look like? How large is the contrast? >>>>> >>>>> >>>>> >>>>> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 >>>>> to 0.44 and density ? from 1700 to 2600 kg/m^3. >>>>> >>>> >>>> That is not too bad. Poorly shaped elements are the next thing to worry >>>> about. Try to keep the aspect ratio below 10 if possible. >>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).* >>>>> >>>>> >> >>>>> >>>>> >>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.* >>>>> >>>>> > >>>>> >>>>> >How many iterations do you have in serial (and then in parallel)? >>>>> >>>>> >>>>> >>>>> Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI ? 680 iterations. >>>>> >>>>> >>>>> >>>>> I attached log files for all simulations (txt files with convergence >>>>> behavior and RAM usage log in separate txt files) and resulting table with >>>>> convergence/memory usage data(xls). Data for main non-regular grid with >>>>> 500K cells and heterogeneous properties are in 500K folder, whereas data >>>>> for simple uniform 125K cells grid with constant properties are in 125K >>>>> folder. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.* >>>>> >>>>> > >>>>> >>>>> >Does the number of iterates increase in parallel? Again, how many iterations do you have? >>>>> >>>>> >>>>> >>>>> For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >>>>> >>>>> >>>> Again, do not use ICC. I am surprised to see such a large jump in >>>> iteration count, but get ICC off the table. >>>> >>>> You will see variability in the iteration count with processor count >>>> with GAMG. As much as 10% +-. Maybe more (random) variability , but usually >>>> less. >>>> >>>> You can decrease the memory a little, and the setup time a lot, by >>>> aggressively coarsening, at the expense of higher iteration counts. It's a >>>> balancing act. >>>> >>>> You can run with the defaults, add '-info', grep on GAMG and send the >>>> ~30 lines of output if you want advice on parameters. >>>> >>> >>> Can you send the output of >>> >>> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> Mark >>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>* Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?* >>>>> >>>>> > >>>>> >>>>> >bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. >>>>> >>>>> >>>>> >>>>> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit >>>>> integers (It is extremely necessary to perform computation on mesh with >>>>> more than 10M cells). >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I found strange convergence behavior for HPDDM preconditioner. For 1 >>>>> MPI process BFBCG solver did not converged >>>>> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation >>>>> was successful (1018 to reach convergence, >>>>> log_hpddm(bfbcg)_pchpddm_4_mpi.txt). >>>>> >>>>> But it should be mentioned that stiffness matrix was created in AIJ >>>>> format (our default matrix format in program). >>>>> >>>>> Matrix conversion to MATIS format via MatConvert subroutine resulted >>>>> in losing of convergence for both serial and parallel run. >>>>> >>>>> >>>>> >>* Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?* >>>>> >>>>> > >>>>> >>>>> >I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. >>>>> >>>>> >>>>> >>>>> Thanks, I`ll try to use a strong threshold only for coarse grids. >>>>> >>>>> >>>>> >>>>> Kind regards, >>>>> >>>>> >>>>> >>>>> Viktor Nazdrachev >>>>> >>>>> >>>>> >>>>> R&D senior researcher >>>>> >>>>> >>>>> >>>>> Geosteering Technologies LLC >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ??, 1 ????. 2021 ?. ? 12:02, Lawrence Mitchell : >>>>> >>>>>> >>>>>> >>>>>> > On 1 Sep 2021, at 09:42, ????????? ?????? >>>>>> wrote: >>>>>> > >>>>>> > I have a 3D elasticity problem with heterogeneous properties. >>>>>> >>>>>> What does your coefficient variation look like? How large is the >>>>>> contrast? >>>>>> >>>>>> > There is unstructured grid with aspect ratio varied from 4 to 25. >>>>>> Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann >>>>>> (traction) BCs are imposed on side faces. Gravity load is also accounted >>>>>> for. The grid I use consists of 500k cells (which is approximately 1.6M of >>>>>> DOFs). >>>>>> > >>>>>> > The best performance and memory usage for single MPI process was >>>>>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as >>>>>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with >>>>>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of >>>>>> number of iterations required to achieve the same tolerance is >>>>>> significantly increased. >>>>>> >>>>>> How many iterations do you have in serial (and then in parallel)? >>>>>> >>>>>> > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) >>>>>> sub-precondtioner. For single MPI process, the calculation took 10 min and >>>>>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >>>>>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >>>>>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >>>>>> Also, there is peak memory usage with 14.1 GB, which appears just before >>>>>> the start of the iterations. Parallel computation with 4 MPI processes took >>>>>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >>>>>> about 22 GB. >>>>>> >>>>>> Does the number of iterates increase in parallel? Again, how many >>>>>> iterations do you have? >>>>>> >>>>>> > Are there ways to avoid decreasing of the convergence rate for >>>>>> bjacobi precondtioner in parallel mode? Does it make sense to use >>>>>> hierarchical or nested krylov methods with a local gmres solver >>>>>> (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type >>>>>> bjacobi)? >>>>>> >>>>>> bjacobi is only a one-level method, so you would not expect >>>>>> process-independent convergence rate for this kind of problem. If the >>>>>> coefficient variation is not too extreme, then I would expect GAMG (or some >>>>>> other smoothed aggregation package, perhaps -pc_type ml (you need >>>>>> --download-ml)) would work well with some tuning. >>>>>> >>>>>> If you have extremely high contrast coefficients you might need >>>>>> something with stronger coarse grids. If you can assemble so-called Neumann >>>>>> matrices ( >>>>>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) >>>>>> then you could try the geneo scheme offered by PCHPDDM. >>>>>> >>>>>> > Is this peak memory usage expected for gamg preconditioner? is >>>>>> there any way to reduce it? >>>>>> >>>>>> I think that peak memory usage comes from building the coarse grids. >>>>>> Can you run with `-info` and grep for GAMG, this will provide some output >>>>>> that more expert GAMG users can interpret. >>>>>> >>>>>> Lawrence >>>>>> >>>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 3 11:18:59 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 3 Sep 2021 12:18:59 -0400 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: The block size has not been set to 3. You need to use MatSetBlockSize. I assume the 6 rigid body modes were not added either. This will help problems where the deformation has some rotation to it. Last time I checked SOR did not check for zero on the diagonal. Jacobi does. It is possible, maybe, that SA can give a singular coarse grid with point coarsening, but I don't think so. You might run a "solve" with no preconditioner and ask for the eigen estimates to get an idea of the spectrum of the system. The eigen estimator says this matrix is very well conditioned. Is there a mass term here? On Fri, Sep 3, 2021 at 11:07 AM Matthew Knepley wrote: > It is a RAR since this is Windows :) > > Viktor, your system looks singular. Is it possible that you somehow have > zero on the diagonal? That might make the > SOR a problem. You could replace that with Jacobi using > > -mg_levels_pc_type jacobi > > 0 KSP Residual norm 2.980664994991e+02 > 0 KSP preconditioned resid norm 2.980664994991e+02 true resid norm > 7.983356882620e+11 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP Residual norm 1.650358505966e+01 > 1 KSP preconditioned resid norm 1.650358505966e+01 true resid norm > 4.601793132543e+12 ||r(i)||/||b|| 5.764233267037e+00 > 2 KSP Residual norm 2.086911345353e+01 > 2 KSP preconditioned resid norm 2.086911345353e+01 true resid norm > 1.258153657657e+12 ||r(i)||/||b|| 1.575970705250e+00 > 3 KSP Residual norm 1.909137523120e+01 > 3 KSP preconditioned resid norm 1.909137523120e+01 true resid norm > 2.179275269000e+12 ||r(i)||/||b|| 2.729773077969e+00 > > Mark, here is the solver > > KSP Object: 1 MPI processes > type: cg > maximum iterations=100000, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: gamg > type is MULTIPLICATIVE, levels=4 cycles=v > Cycles per PCApply=1 > Using externally compute Galerkin coarse grid matrices > GAMG specific options > Threshold for dropping small values in graph on each level = 0. > 0. 0. 0. > Threshold scaling factor for each level not specified = 1. > AGG specific options > Symmetric graph false > Number of levels to square graph 1 > Number smoothing steps 1 > Complexity: grid = 1.0042 > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 1 MPI processes > type: bjacobi > number of blocks = 1 > Local solver information for first block is in the following KSP > and PC objects on rank 0: > Use -mg_coarse_ksp_view ::ascii_info_detail to display information > for all blocks > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1.19444 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=36, cols=36 > package used to perform factorization: petsc > total: nonzeros=774, allocated nonzeros=774 > using I-node routines: found 22 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: (mg_coarse_sub_) 1 MPI processes > type: seqaij > rows=36, cols=36 > total: nonzeros=648, allocated nonzeros=648 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: (mg_coarse_sub_) 1 MPI processes > type: seqaij > rows=36, cols=36 > total: nonzeros=648, allocated nonzeros=648 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 1 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0.0997354, max = 1.09709 > eigenvalues estimate via gmres min 0.00372245, max 0.997354 > eigenvalues estimated using gmres with translations [0. 0.1; 0. > 1.1] > KSP Object: (mg_levels_1_esteig_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > estimating eigenvalues using noisy right hand side > maximum iterations=2, nonzero initial guess > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 1 MPI processes > type: sor > type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=902, cols=902 > total: nonzeros=66660, allocated nonzeros=66660 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 1 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0.0994525, max = 1.09398 > eigenvalues estimate via gmres min 0.0303095, max 0.994525 > eigenvalues estimated using gmres with translations [0. 0.1; 0. > 1.1] > KSP Object: (mg_levels_2_esteig_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > estimating eigenvalues using noisy right hand side > maximum iterations=2, nonzero initial guess > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 1 MPI processes > type: sor > type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=12043, cols=12043 > total: nonzeros=455611, allocated nonzeros=455611 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 1 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0.0992144, max = 1.09136 > eigenvalues estimate via gmres min 0.0222691, max 0.992144 > eigenvalues estimated using gmres with translations [0. 0.1; 0. > 1.1] > KSP Object: (mg_levels_3_esteig_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > estimating eigenvalues using noisy right hand side > maximum iterations=2, nonzero initial guess > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 1 MPI processes > type: sor > type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=1600200, cols=1600200 > total: nonzeros=124439742, allocated nonzeros=129616200 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 533400 nodes, limit used is 5 > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=1600200, cols=1600200 > total: nonzeros=124439742, allocated nonzeros=129616200 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 533400 nodes, limit used is 5 > > Thanks, > > Matt > > On Fri, Sep 3, 2021 at 10:56 AM Mark Adams wrote: > >> That does not seem to be an ASCII file. >> >> On Fri, Sep 3, 2021 at 10:48 AM Viktor Nazdrachev >> wrote: >> >>> Hello Mark and Matthew! >>> >>> >>> >>> I attached log files for serial and parallel cases and corresponding information about GAMG preconditioner (using grep). >>> >>> I have to notice, that assembling of global stiffness matrix in code was performed by MatSetValues subrotuine (not MatSetValuesBlocked) >>> >>> !nnds ? number of nodes >>> >>> !dmn=3 >>> >>> call MatCreate(Petsc_Comm_World,Mat_K,ierr) >>> >>> call MatSetFromOptions(Mat_K,ierr) >>> >>> call MatSetSizes(Mat_K,Petsc_Decide,Petsc_Decide,n,n,ierr_m) >>> >>> ? >>> >>> call MatMPIAIJSetPreallocation(Mat_K,0,dbw,0,obw,ierr) >>> >>> ? >>> >>> call MatSetOption(Mat_K,Mat_New_Nonzero_Allocation_Err,Petsc_False,ierr) >>> >>> ? >>> >>> do i=1,nels >>> >>> call FormLocalK(i,k,indx,"Kp") ! find local stiffness matrix >>> >>> indx=indxmap(indx,2) !find global indices for DOFs >>> >>> call MatSetValues(Mat_K,ef_eldof,indx,ef_eldof,indx,k,Add_Values,ierr) >>> >>> end do >>> >>> >>> >>> But nullspace vector was created using VecSetBlockSize subroutine. >>> >>> >>> >>> call VecCreate(Petsc_Comm_World,Vec_NullSpace,ierr) >>> >>> call VecSetBlockSize(Vec_NullSpace,dmn,ierr) >>> >>> call VecSetSizes(Vec_NullSpace,nnds*dmn,Petsc_Decide,ierr) >>> >>> call VecSetUp(Vec_NullSpace,ierr) >>> >>> call VecGetArrayF90(Vec_NullSpace,null_space,ierr) >>> >>> ? >>> >>> call VecRestoreArrayF90(Vec_NullSpace,null_space,ierr) >>> >>> call MatNullSpaceCreateRigidBody(Vec_NullSpace,matnull,ierr) >>> >>> call MatSetNearNullSpace(Mat_K,matnull,ierr) >>> >>> >>> >>> I suppose it can be one of the reasons of GAMG slow convergence. >>> >>> So I attached log files for parallel run with ?pure? GAMG precondtioner. >>> >>> >>> >>> >>> >>> Kind regards, >>> >>> >>> >>> Viktor Nazdrachev >>> >>> >>> >>> R&D senior researcher >>> >>> >>> >>> Geosteering Technologies LLC >>> >>> ??, 3 ????. 2021 ?. ? 15:11, Matthew Knepley : >>> >>>> On Fri, Sep 3, 2021 at 8:02 AM Mark Adams wrote: >>>> >>>>> >>>>> >>>>> On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev < >>>>> numbersixvs at gmail.com> wrote: >>>>> >>>>>> Hello, Lawrence! >>>>>> Thank you for your response! >>>>>> >>>>>> I attached log files (txt files with convergence behavior and RAM >>>>>> usage log in separate txt files) and resulting table with convergence >>>>>> investigation data(xls). Data for main non-regular grid with 500K cells and >>>>>> heterogeneous properties are in 500K folder, whereas data for simple >>>>>> uniform 125K cells grid with constant properties are in 125K folder. >>>>>> >>>>>> >>>>>> >>* On 1 Sep 2021, at 09:42, **?????????** ??????** **> wrote:* >>>>>> >>>>>> >> >>>>>> >>>>>> >>* I have a 3D elasticity problem with heterogeneous properties.* >>>>>> >>>>>> > >>>>>> >>>>>> >What does your coefficient variation look like? How large is the contrast? >>>>>> >>>>>> >>>>>> >>>>>> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 >>>>>> to 0.44 and density ? from 1700 to 2600 kg/m^3. >>>>>> >>>>> >>>>> That is not too bad. Poorly shaped elements are the next thing to >>>>> worry about. Try to keep the aspect ratio below 10 if possible. >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs).* >>>>>> >>>>>> >> >>>>>> >>>>>> >>* The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased.* >>>>>> >>>>>> > >>>>>> >>>>>> >How many iterations do you have in serial (and then in parallel)? >>>>>> >>>>>> >>>>>> >>>>>> Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI ? 680 iterations. >>>>>> >>>>>> >>>>>> >>>>>> I attached log files for all simulations (txt files with convergence >>>>>> behavior and RAM usage log in separate txt files) and resulting table with >>>>>> convergence/memory usage data(xls). Data for main non-regular grid with >>>>>> 500K cells and heterogeneous properties are in 500K folder, whereas data >>>>>> for simple uniform 125K cells grid with constant properties are in 125K >>>>>> folder. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**?** (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.* >>>>>> >>>>>> > >>>>>> >>>>>> >Does the number of iterates increase in parallel? Again, how many iterations do you have? >>>>>> >>>>>> >>>>>> >>>>>> For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >>>>>> >>>>>> >>>>> Again, do not use ICC. I am surprised to see such a large jump in >>>>> iteration count, but get ICC off the table. >>>>> >>>>> You will see variability in the iteration count with processor count >>>>> with GAMG. As much as 10% +-. Maybe more (random) variability , but usually >>>>> less. >>>>> >>>>> You can decrease the memory a little, and the setup time a lot, by >>>>> aggressively coarsening, at the expense of higher iteration counts. It's a >>>>> balancing act. >>>>> >>>>> You can run with the defaults, add '-info', grep on GAMG and send the >>>>> ~30 lines of output if you want advice on parameters. >>>>> >>>> >>>> Can you send the output of >>>> >>>> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>* Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)?* >>>>>> >>>>>> > >>>>>> >>>>>> >bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. >>>>>> >>>>>> >>>>>> >>>>>> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit >>>>>> integers (It is extremely necessary to perform computation on mesh with >>>>>> more than 10M cells). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you could try the geneo scheme offered by PCHPDDM. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I found strange convergence behavior for HPDDM preconditioner. For 1 >>>>>> MPI process BFBCG solver did not converged >>>>>> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation >>>>>> was successful (1018 to reach convergence, >>>>>> log_hpddm(bfbcg)_pchpddm_4_mpi.txt). >>>>>> >>>>>> But it should be mentioned that stiffness matrix was created in AIJ >>>>>> format (our default matrix format in program). >>>>>> >>>>>> Matrix conversion to MATIS format via MatConvert subroutine resulted >>>>>> in losing of convergence for both serial and parallel run. >>>>>> >>>>>> >>>>>> >>* Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it?* >>>>>> >>>>>> > >>>>>> >>>>>> >I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. >>>>>> >>>>>> >>>>>> >>>>>> Thanks, I`ll try to use a strong threshold only for coarse grids. >>>>>> >>>>>> >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> >>>>>> >>>>>> Viktor Nazdrachev >>>>>> >>>>>> >>>>>> >>>>>> R&D senior researcher >>>>>> >>>>>> >>>>>> >>>>>> Geosteering Technologies LLC >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ??, 1 ????. 2021 ?. ? 12:02, Lawrence Mitchell : >>>>>> >>>>>>> >>>>>>> >>>>>>> > On 1 Sep 2021, at 09:42, ????????? ?????? >>>>>>> wrote: >>>>>>> > >>>>>>> > I have a 3D elasticity problem with heterogeneous properties. >>>>>>> >>>>>>> What does your coefficient variation look like? How large is the >>>>>>> contrast? >>>>>>> >>>>>>> > There is unstructured grid with aspect ratio varied from 4 to 25. >>>>>>> Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann >>>>>>> (traction) BCs are imposed on side faces. Gravity load is also accounted >>>>>>> for. The grid I use consists of 500k cells (which is approximately 1.6M of >>>>>>> DOFs). >>>>>>> > >>>>>>> > The best performance and memory usage for single MPI process was >>>>>>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as >>>>>>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with >>>>>>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of >>>>>>> number of iterations required to achieve the same tolerance is >>>>>>> significantly increased. >>>>>>> >>>>>>> How many iterations do you have in serial (and then in parallel)? >>>>>>> >>>>>>> > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) >>>>>>> sub-precondtioner. For single MPI process, the calculation took 10 min and >>>>>>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >>>>>>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >>>>>>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >>>>>>> Also, there is peak memory usage with 14.1 GB, which appears just before >>>>>>> the start of the iterations. Parallel computation with 4 MPI processes took >>>>>>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >>>>>>> about 22 GB. >>>>>>> >>>>>>> Does the number of iterates increase in parallel? Again, how many >>>>>>> iterations do you have? >>>>>>> >>>>>>> > Are there ways to avoid decreasing of the convergence rate for >>>>>>> bjacobi precondtioner in parallel mode? Does it make sense to use >>>>>>> hierarchical or nested krylov methods with a local gmres solver >>>>>>> (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type >>>>>>> bjacobi)? >>>>>>> >>>>>>> bjacobi is only a one-level method, so you would not expect >>>>>>> process-independent convergence rate for this kind of problem. If the >>>>>>> coefficient variation is not too extreme, then I would expect GAMG (or some >>>>>>> other smoothed aggregation package, perhaps -pc_type ml (you need >>>>>>> --download-ml)) would work well with some tuning. >>>>>>> >>>>>>> If you have extremely high contrast coefficients you might need >>>>>>> something with stronger coarse grids. If you can assemble so-called Neumann >>>>>>> matrices ( >>>>>>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) >>>>>>> then you could try the geneo scheme offered by PCHPDDM. >>>>>>> >>>>>>> > Is this peak memory usage expected for gamg preconditioner? is >>>>>>> there any way to reduce it? >>>>>>> >>>>>>> I think that peak memory usage comes from building the coarse grids. >>>>>>> Can you run with `-info` and grep for GAMG, this will provide some output >>>>>>> that more expert GAMG users can interpret. >>>>>>> >>>>>>> Lawrence >>>>>>> >>>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuelestes91 at gmail.com Fri Sep 3 11:34:06 2021 From: samuelestes91 at gmail.com (Samuel Estes) Date: Fri, 3 Sep 2021 11:34:06 -0500 Subject: [petsc-users] Solving subsystem using larger matrix Message-ID: Hi, I have a model in which we alternatively solve two submodels by the finite element method both on the same unstructured mesh. The first model (call it model 1) has three degrees of freedom per node while the second model (model 2) is scalar (one degree of freedom). We are trying several different implementations but one option that we would like to try is to use the same matrix for both models. In other words, assuming we have n nodes then we allocate a square matrix A to have 3*n rows, solve model 1, and then reuse this matrix to solve model 2. So for model 2 any non-zeros in the matrix A will be confined to the upper left ninth and the remaining 8/9ths of the matrix are irrelevant to the problem. I have several questions about how best to implement something like this which I will list below: 1. The solver requires that the matrix at least have some non-zeros in each row, otherwise the solution ends up being all NANs. This makes sense as it is solving the subsystem 0*x=0 which is clearly ill-defined. Is there any way that I can communicate to PETSc, either through the solver or matrix classes or some other way, that all I care about is a subsystem and that I would like to use only a portion of the matrix A, the right hand side and the solution vector? There are some routines involving submatrices in the man pages but I'm not sure if they are appropriate for this problem or not. In particular, the MatGetLocalSubMatrix routine might be what I need but its not clear to me whether or not this actually copies the array of values in the submatrix (not desirable due to memory concerns) or if it is just essentially a pointer to the submatrix of values in the original matrix A (ideal). Basically, the idea is to use a part of the matrix that we have without allocating unnecessary extra memory. 2. Is there a way to use multiple local to global mappings for a single matrix. I have a problem when I try to use the same local to global mapping from model 1 in model 2. I understand why this is and can fix it but being able to reset the local to global mapping without destroying the matrix would be an ideal fix. 3. Any other input on the best way to approach solving a problem like this using PETSc would be appreciated. I'm somewhat of a novice when it comes to PETSc so I don't necessarily know all the tools which are available to me. It seems clear that I've reached a point where the manual and man pages aren't quite as helpful as they were for more basic operations. I hope my explanation of the problem and my questions was clear. If not, let me know and I can try to provide more details. Thanks! Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Sep 3 11:49:23 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 Sep 2021 12:49:23 -0400 Subject: [petsc-users] Solving subsystem using larger matrix In-Reply-To: References: Message-ID: On Fri, Sep 3, 2021 at 12:34 PM Samuel Estes wrote: > Hi, > > I have a model in which we alternatively solve two submodels by the finite > element method both on the same unstructured mesh. The first model (call it > model 1) has three degrees of freedom per node while the second model > (model 2) is scalar (one degree of freedom). We are trying several > different implementations but one option that we would like to try is to > use the same matrix for both models. In other words, assuming we have n > nodes then we allocate a square matrix A to have 3*n rows, solve model 1, > and then reuse this matrix to solve model 2. So for model 2 any non-zeros > in the matrix A will be confined to the upper left ninth and the remaining > 8/9ths of the matrix are irrelevant to the problem. I have several > questions about how best to implement something like this which I will list > below: > I can tell you how to do the operations below. However, I think we should go over why you want to do this first. What benefit do you hope to have with this scheme over just using two matrices. Thanks, Matt > 1. The solver requires that the matrix at least have some non-zeros in > each row, otherwise the solution ends up being all NANs. This makes sense > as it is solving the subsystem 0*x=0 which is clearly ill-defined. Is there > any way that I can communicate to PETSc, either through the solver or > matrix classes or some other way, that all I care about is a subsystem and > that I would like to use only a portion of the matrix A, the right hand > side and the solution vector? There are some routines involving submatrices > in the man pages but I'm not sure if they are appropriate for this problem > or not. In particular, the MatGetLocalSubMatrix routine might be what I > need but its not clear to me whether or not this actually copies the array > of values in the submatrix (not desirable due to memory concerns) or if it > is just essentially a pointer to the submatrix of values in the original > matrix A (ideal). Basically, the idea is to use a part of the matrix that > we have without allocating unnecessary extra memory. > 2. Is there a way to use multiple local to global mappings for a single > matrix. I have a problem when I try to use the same local to global mapping > from model 1 in model 2. I understand why this is and can fix it but being > able to reset the local to global mapping without destroying the matrix > would be an ideal fix. > 3. Any other input on the best way to approach solving a problem like this > using PETSc would be appreciated. I'm somewhat of a novice when it comes to > PETSc so I don't necessarily know all the tools which are available to me. > It seems clear that I've reached a point where the manual and man pages > aren't quite as helpful as they were for more basic operations. > > I hope my explanation of the problem and my questions was clear. If not, > let me know and I can try to provide more details. Thanks! > > Sam > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From paulsank at msu.edu Fri Sep 3 12:53:05 2021 From: paulsank at msu.edu (Paul, Sanku) Date: Fri, 3 Sep 2021 17:53:05 +0000 Subject: [petsc-users] Matrix exponential Message-ID: Dear Sir/Ma'am, I am trying to use SLEPc to calculate matrix exponential in my python code but I am not getting the correct result. I have attached the code. Could you let me know what I am doing wrong. This is my first time using SLEPc. So, I would like to ask you if you could send me a tutorial on matrix exponential using SLEPc in python code. Best, Sanku -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex2.py Type: text/x-python Size: 1329 bytes Desc: ex2.py URL: From jroman at dsic.upv.es Fri Sep 3 13:13:18 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 3 Sep 2021 20:13:18 +0200 Subject: [petsc-users] Matrix exponential In-Reply-To: References: Message-ID: <699FFF16-6190-48D1-AFAE-18638531C277@dsic.upv.es> You should either create the FN object and then E.setFN(F) or extract the FN object and assign to a variable F = E.getFN() You can see an example in $SLEPC_DIR/src/binding/slepc4py/demo/ex6.py Jose > El 3 sept 2021, a las 19:53, Paul, Sanku escribi?: > > Dear Sir/Ma'am, > > I am trying to use SLEPc to calculate matrix exponential in my python code but I am not getting the correct result. I have attached the code. Could you let me know what I am doing wrong. This is my first time using SLEPc. So, I would like to ask you if you could send me a tutorial on matrix exponential using SLEPc in python code. > > Best, > Sanku > From jroman at dsic.upv.es Fri Sep 3 13:53:41 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 3 Sep 2021 20:53:41 +0200 Subject: [petsc-users] Matrix exponential In-Reply-To: References: <699FFF16-6190-48D1-AFAE-18638531C277@dsic.upv.es> Message-ID: <36DDF934-D614-405F-93CC-081AED407ECF@dsic.upv.es> Please always reply to the list (Reply-All), not to myself. You should be able to convert from a scipy sparse matrix to a PETSc matrix via PETSc.Mat().createAIJWithArrays(). Don't know how if there is any example in the petsc4py documentation. Jose > El 3 sept 2021, a las 20:26, Paul, Sanku escribi?: > > Dear Jose, > > Thank you very much for your help. I have another question can we just simply pass a sparse.csr.matrix to A. For instance, if B is the sparse.csr.matrix can we do A=B.copy(). Or do I have to do it in a different way? > > Best, > Sanku > From: Jose E. Roman > Sent: Friday, September 3, 2021 2:13 PM > To: Paul, Sanku > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Matrix exponential > > You should either create the FN object and then > > E.setFN(F) > > or extract the FN object and assign to a variable > > F = E.getFN() > > You can see an example in $SLEPC_DIR/src/binding/slepc4py/demo/ex6.py > > > Jose > > > > El 3 sept 2021, a las 19:53, Paul, Sanku escribi?: > > > > Dear Sir/Ma'am, > > > > I am trying to use SLEPc to calculate matrix exponential in my python code but I am not getting the correct result. I have attached the code. Could you let me know what I am doing wrong. This is my first time using SLEPc. So, I would like to ask you if you could send me a tutorial on matrix exponential using SLEPc in python code. > > > > Best, > > Sanku > > From paulsank at msu.edu Fri Sep 3 18:39:19 2021 From: paulsank at msu.edu (Paul, Sanku) Date: Fri, 3 Sep 2021 23:39:19 +0000 Subject: [petsc-users] Matrix exponential In-Reply-To: <36DDF934-D614-405F-93CC-081AED407ECF@dsic.upv.es> References: <699FFF16-6190-48D1-AFAE-18638531C277@dsic.upv.es> <36DDF934-D614-405F-93CC-081AED407ECF@dsic.upv.es> Message-ID: Hi Jose, I could now do matrix exponential but facing a problem with a complex matrix. In particular, I want to do \exp(-itH), where H is a Hamiltonian. How to implement this? Thanks, Sanku ________________________________ From: Jose E. Roman Sent: Friday, September 3, 2021 2:53 PM To: Paul, Sanku Cc: PETSc Subject: Re: [petsc-users] Matrix exponential Please always reply to the list (Reply-All), not to myself. You should be able to convert from a scipy sparse matrix to a PETSc matrix via PETSc.Mat().createAIJWithArrays(). Don't know how if there is any example in the petsc4py documentation. Jose > El 3 sept 2021, a las 20:26, Paul, Sanku escribi?: > > Dear Jose, > > Thank you very much for your help. I have another question can we just simply pass a sparse.csr.matrix to A. For instance, if B is the sparse.csr.matrix can we do A=B.copy(). Or do I have to do it in a different way? > > Best, > Sanku > From: Jose E. Roman > Sent: Friday, September 3, 2021 2:13 PM > To: Paul, Sanku > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Matrix exponential > > You should either create the FN object and then > > E.setFN(F) > > or extract the FN object and assign to a variable > > F = E.getFN() > > You can see an example in $SLEPC_DIR/src/binding/slepc4py/demo/ex6.py > > > Jose > > > > El 3 sept 2021, a las 19:53, Paul, Sanku escribi?: > > > > Dear Sir/Ma'am, > > > > I am trying to use SLEPc to calculate matrix exponential in my python code but I am not getting the correct result. I have attached the code. Could you let me know what I am doing wrong. This is my first time using SLEPc. So, I would like to ask you if you could send me a tutorial on matrix exponential using SLEPc in python code. > > > > Best, > > Sanku > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paulsank at msu.edu Fri Sep 3 19:11:00 2021 From: paulsank at msu.edu (Paul, Sanku) Date: Sat, 4 Sep 2021 00:11:00 +0000 Subject: [petsc-users] Matrix exponential In-Reply-To: References: <699FFF16-6190-48D1-AFAE-18638531C277@dsic.upv.es> <36DDF934-D614-405F-93CC-081AED407ECF@dsic.upv.es> Message-ID: Hi Jose, I tried to install ml -* foss/2019b Python SciPy-bundle/2019.10-Python-3.7.4 virtualenv slepc4py cd slepc4py source bin/activate pip install numpy mpi4py cython export PETSC_CONFIGURE_OPTIONS="--with-scalar-type=complex" export PETSC_DIR=/path/to/petsc PETSC_ARCH=your-arch-name pip install petsc petsc4py pip install slepc slepc4py But still facing problem with complex matrices. Please help me to fix this. Thanks, Sanku ________________________________ From: Paul, Sanku Sent: Friday, September 3, 2021 7:39 PM To: Jose E. Roman Cc: PETSc Subject: Re: [petsc-users] Matrix exponential Hi Jose, I could now do matrix exponential but facing a problem with a complex matrix. In particular, I want to do \exp(-itH), where H is a Hamiltonian. How to implement this? Thanks, Sanku ________________________________ From: Jose E. Roman Sent: Friday, September 3, 2021 2:53 PM To: Paul, Sanku Cc: PETSc Subject: Re: [petsc-users] Matrix exponential Please always reply to the list (Reply-All), not to myself. You should be able to convert from a scipy sparse matrix to a PETSc matrix via PETSc.Mat().createAIJWithArrays(). Don't know how if there is any example in the petsc4py documentation. Jose > El 3 sept 2021, a las 20:26, Paul, Sanku escribi?: > > Dear Jose, > > Thank you very much for your help. I have another question can we just simply pass a sparse.csr.matrix to A. For instance, if B is the sparse.csr.matrix can we do A=B.copy(). Or do I have to do it in a different way? > > Best, > Sanku > From: Jose E. Roman > Sent: Friday, September 3, 2021 2:13 PM > To: Paul, Sanku > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Matrix exponential > > You should either create the FN object and then > > E.setFN(F) > > or extract the FN object and assign to a variable > > F = E.getFN() > > You can see an example in $SLEPC_DIR/src/binding/slepc4py/demo/ex6.py > > > Jose > > > > El 3 sept 2021, a las 19:53, Paul, Sanku escribi?: > > > > Dear Sir/Ma'am, > > > > I am trying to use SLEPc to calculate matrix exponential in my python code but I am not getting the correct result. I have attached the code. Could you let me know what I am doing wrong. This is my first time using SLEPc. So, I would like to ask you if you could send me a tutorial on matrix exponential using SLEPc in python code. > > > > Best, > > Sanku > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sat Sep 4 04:33:43 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sat, 4 Sep 2021 11:33:43 +0200 Subject: [petsc-users] Matrix exponential In-Reply-To: References: <699FFF16-6190-48D1-AFAE-18638531C277@dsic.upv.es> <36DDF934-D614-405F-93CC-081AED407ECF@dsic.upv.es> Message-ID: <6D34FC62-FB0F-412F-8339-BC100400B93F@dsic.upv.es> No problem, do F.setScale(-1j*t) What do you mean "still facing problem with complex matrices"? Did you get an error during installation? Did it install for real scalars? Try setting PETSC_CONFIGURE_OPTIONS only, and unset PETSC_DIR PETSC_ARCH. Otherwise pip may not take into account --with-scalar-type=complex Jose > El 4 sept 2021, a las 2:11, Paul, Sanku escribi?: > > Hi Jose, > > I tried to install > ml -* foss/2019b Python SciPy-bundle/2019.10-Python-3.7.4 > virtualenv slepc4py > cd slepc4py > source bin/activate > pip install numpy mpi4py cython > export PETSC_CONFIGURE_OPTIONS="--with-scalar-type=complex" > export PETSC_DIR=/path/to/petsc PETSC_ARCH=your-arch-name > pip install petsc petsc4py > pip install slepc slepc4py > > But still facing problem with complex matrices. Please help me to fix this. > > Thanks, > Sanku > > > From: Paul, Sanku > Sent: Friday, September 3, 2021 7:39 PM > To: Jose E. Roman > Cc: PETSc > Subject: Re: [petsc-users] Matrix exponential > > Hi Jose, > > I could now do matrix exponential but facing a problem with a complex matrix. In particular, I want to do \exp(-itH), where H is a Hamiltonian. How to implement this? > > Thanks, > Sanku > From: Jose E. Roman > Sent: Friday, September 3, 2021 2:53 PM > To: Paul, Sanku > Cc: PETSc > Subject: Re: [petsc-users] Matrix exponential > > Please always reply to the list (Reply-All), not to myself. > > You should be able to convert from a scipy sparse matrix to a PETSc matrix via PETSc.Mat().createAIJWithArrays(). Don't know how if there is any example in the petsc4py documentation. > > Jose > > > > El 3 sept 2021, a las 20:26, Paul, Sanku escribi?: > > > > Dear Jose, > > > > Thank you very much for your help. I have another question can we just simply pass a sparse.csr.matrix to A. For instance, if B is the sparse.csr.matrix can we do A=B.copy(). Or do I have to do it in a different way? > > > > Best, > > Sanku > > From: Jose E. Roman > > Sent: Friday, September 3, 2021 2:13 PM > > To: Paul, Sanku > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix exponential > > > > You should either create the FN object and then > > > > E.setFN(F) > > > > or extract the FN object and assign to a variable > > > > F = E.getFN() > > > > You can see an example in $SLEPC_DIR/src/binding/slepc4py/demo/ex6.py > > > > > > Jose > > > > > > > El 3 sept 2021, a las 19:53, Paul, Sanku escribi?: > > > > > > Dear Sir/Ma'am, > > > > > > I am trying to use SLEPc to calculate matrix exponential in my python code but I am not getting the correct result. I have attached the code. Could you let me know what I am doing wrong. This is my first time using SLEPc. So, I would like to ask you if you could send me a tutorial on matrix exponential using SLEPc in python code. > > > > > > Best, > > > Sanku > > > From paulsank at msu.edu Sun Sep 5 19:30:15 2021 From: paulsank at msu.edu (Paul, Sanku) Date: Mon, 6 Sep 2021 00:30:15 +0000 Subject: [petsc-users] Matrix exponential In-Reply-To: <6D34FC62-FB0F-412F-8339-BC100400B93F@dsic.upv.es> References: <699FFF16-6190-48D1-AFAE-18638531C277@dsic.upv.es> <36DDF934-D614-405F-93CC-081AED407ECF@dsic.upv.es> <6D34FC62-FB0F-412F-8339-BC100400B93F@dsic.upv.es> Message-ID: Hi Jose, While I am running from petsc4py import PETSc >>> print(PETSc.ScalarType) only float64 I am getting not complex128. Do I have to uninstall and then install it? Otherwise doing F.setScale(-1j*t) I am getting an error File "SLEPc/FN.pyx", line 204, in slepc4py.SLEPc.FN.setScale File "SLEPc/SLEPc.pyx", line 115, in slepc4py.SLEPc.asScalar TypeError: can't convert complex to float Sanku ________________________________ From: Jose E. Roman Sent: Saturday, September 4, 2021 5:33 AM To: Paul, Sanku Cc: PETSc Subject: Re: [petsc-users] Matrix exponential No problem, do F.setScale(-1j*t) What do you mean "still facing problem with complex matrices"? Did you get an error during installation? Did it install for real scalars? Try setting PETSC_CONFIGURE_OPTIONS only, and unset PETSC_DIR PETSC_ARCH. Otherwise pip may not take into account --with-scalar-type=complex Jose > El 4 sept 2021, a las 2:11, Paul, Sanku escribi?: > > Hi Jose, > > I tried to install > ml -* foss/2019b Python SciPy-bundle/2019.10-Python-3.7.4 > virtualenv slepc4py > cd slepc4py > source bin/activate > pip install numpy mpi4py cython > export PETSC_CONFIGURE_OPTIONS="--with-scalar-type=complex" > export PETSC_DIR=/path/to/petsc PETSC_ARCH=your-arch-name > pip install petsc petsc4py > pip install slepc slepc4py > > But still facing problem with complex matrices. Please help me to fix this. > > Thanks, > Sanku > > > From: Paul, Sanku > Sent: Friday, September 3, 2021 7:39 PM > To: Jose E. Roman > Cc: PETSc > Subject: Re: [petsc-users] Matrix exponential > > Hi Jose, > > I could now do matrix exponential but facing a problem with a complex matrix. In particular, I want to do \exp(-itH), where H is a Hamiltonian. How to implement this? > > Thanks, > Sanku > From: Jose E. Roman > Sent: Friday, September 3, 2021 2:53 PM > To: Paul, Sanku > Cc: PETSc > Subject: Re: [petsc-users] Matrix exponential > > Please always reply to the list (Reply-All), not to myself. > > You should be able to convert from a scipy sparse matrix to a PETSc matrix via PETSc.Mat().createAIJWithArrays(). Don't know how if there is any example in the petsc4py documentation. > > Jose > > > > El 3 sept 2021, a las 20:26, Paul, Sanku escribi?: > > > > Dear Jose, > > > > Thank you very much for your help. I have another question can we just simply pass a sparse.csr.matrix to A. For instance, if B is the sparse.csr.matrix can we do A=B.copy(). Or do I have to do it in a different way? > > > > Best, > > Sanku > > From: Jose E. Roman > > Sent: Friday, September 3, 2021 2:13 PM > > To: Paul, Sanku > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix exponential > > > > You should either create the FN object and then > > > > E.setFN(F) > > > > or extract the FN object and assign to a variable > > > > F = E.getFN() > > > > You can see an example in $SLEPC_DIR/src/binding/slepc4py/demo/ex6.py > > > > > > Jose > > > > > > > El 3 sept 2021, a las 19:53, Paul, Sanku escribi?: > > > > > > Dear Sir/Ma'am, > > > > > > I am trying to use SLEPc to calculate matrix exponential in my python code but I am not getting the correct result. I have attached the code. Could you let me know what I am doing wrong. This is my first time using SLEPc. So, I would like to ask you if you could send me a tutorial on matrix exponential using SLEPc in python code. > > > > > > Best, > > > Sanku > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paulsank at msu.edu Sun Sep 5 20:30:49 2021 From: paulsank at msu.edu (Paul, Sanku) Date: Mon, 6 Sep 2021 01:30:49 +0000 Subject: [petsc-users] Matrix exponential In-Reply-To: References: <699FFF16-6190-48D1-AFAE-18638531C277@dsic.upv.es> <36DDF934-D614-405F-93CC-081AED407ECF@dsic.upv.es> <6D34FC62-FB0F-412F-8339-BC100400B93F@dsic.upv.es> Message-ID: Hi Jose, Thanks for your help. I successfully reinstalled the packages and now it is running. Thank you very much for your help. Best, Sanku ________________________________ From: Paul, Sanku Sent: Sunday, September 5, 2021 8:30 PM To: Jose E. Roman Cc: PETSc Subject: Re: [petsc-users] Matrix exponential Hi Jose, While I am running from petsc4py import PETSc >>> print(PETSc.ScalarType) only float64 I am getting not complex128. Do I have to uninstall and then install it? Otherwise doing F.setScale(-1j*t) I am getting an error File "SLEPc/FN.pyx", line 204, in slepc4py.SLEPc.FN.setScale File "SLEPc/SLEPc.pyx", line 115, in slepc4py.SLEPc.asScalar TypeError: can't convert complex to float Sanku ________________________________ From: Jose E. Roman Sent: Saturday, September 4, 2021 5:33 AM To: Paul, Sanku Cc: PETSc Subject: Re: [petsc-users] Matrix exponential No problem, do F.setScale(-1j*t) What do you mean "still facing problem with complex matrices"? Did you get an error during installation? Did it install for real scalars? Try setting PETSC_CONFIGURE_OPTIONS only, and unset PETSC_DIR PETSC_ARCH. Otherwise pip may not take into account --with-scalar-type=complex Jose > El 4 sept 2021, a las 2:11, Paul, Sanku escribi?: > > Hi Jose, > > I tried to install > ml -* foss/2019b Python SciPy-bundle/2019.10-Python-3.7.4 > virtualenv slepc4py > cd slepc4py > source bin/activate > pip install numpy mpi4py cython > export PETSC_CONFIGURE_OPTIONS="--with-scalar-type=complex" > export PETSC_DIR=/path/to/petsc PETSC_ARCH=your-arch-name > pip install petsc petsc4py > pip install slepc slepc4py > > But still facing problem with complex matrices. Please help me to fix this. > > Thanks, > Sanku > > > From: Paul, Sanku > Sent: Friday, September 3, 2021 7:39 PM > To: Jose E. Roman > Cc: PETSc > Subject: Re: [petsc-users] Matrix exponential > > Hi Jose, > > I could now do matrix exponential but facing a problem with a complex matrix. In particular, I want to do \exp(-itH), where H is a Hamiltonian. How to implement this? > > Thanks, > Sanku > From: Jose E. Roman > Sent: Friday, September 3, 2021 2:53 PM > To: Paul, Sanku > Cc: PETSc > Subject: Re: [petsc-users] Matrix exponential > > Please always reply to the list (Reply-All), not to myself. > > You should be able to convert from a scipy sparse matrix to a PETSc matrix via PETSc.Mat().createAIJWithArrays(). Don't know how if there is any example in the petsc4py documentation. > > Jose > > > > El 3 sept 2021, a las 20:26, Paul, Sanku escribi?: > > > > Dear Jose, > > > > Thank you very much for your help. I have another question can we just simply pass a sparse.csr.matrix to A. For instance, if B is the sparse.csr.matrix can we do A=B.copy(). Or do I have to do it in a different way? > > > > Best, > > Sanku > > From: Jose E. Roman > > Sent: Friday, September 3, 2021 2:13 PM > > To: Paul, Sanku > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Matrix exponential > > > > You should either create the FN object and then > > > > E.setFN(F) > > > > or extract the FN object and assign to a variable > > > > F = E.getFN() > > > > You can see an example in $SLEPC_DIR/src/binding/slepc4py/demo/ex6.py > > > > > > Jose > > > > > > > El 3 sept 2021, a las 19:53, Paul, Sanku escribi?: > > > > > > Dear Sir/Ma'am, > > > > > > I am trying to use SLEPc to calculate matrix exponential in my python code but I am not getting the correct result. I have attached the code. Could you let me know what I am doing wrong. This is my first time using SLEPc. So, I would like to ask you if you could send me a tutorial on matrix exponential using SLEPc in python code. > > > > > > Best, > > > Sanku > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Mon Sep 6 11:22:29 2021 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Mon, 6 Sep 2021 18:22:29 +0200 Subject: [petsc-users] Mat preallocation for SNES jacobian [WAS Re: Mat preallocation in case of variable stencils] In-Reply-To: <87k0k1zl2y.fsf@jedbrown.org> References: <87k0k1zl2y.fsf@jedbrown.org> Message-ID: <9f505269-c8d6-0847-cf05-019b36ae1aee@uninsubria.it> Il 31/08/21 17:32, Jed Brown ha scritto: > Matteo Semplice writes: > >> Hi. >> >> We are writing a code for a FD scheme on an irregular domain and thus >> the local stencil is quite variable: we have inner nodes, boundary nodes >> and inactive nodes, each with their own stencil type and offset with >> respect to the grid node. We currently create a matrix with >> DMCreateMatrix on a DMDA and for now have set the option >> MAT_NEW_NONZERO_LOCATIONS to PETSC_TRUE, but its time to render the code >> memory-efficient. The layout created automatically is correct for inner >> nodes, wrong for boundary ones (off-centered stencils) and redundant for >> outer nodes. >> >> After the preprocessing stage (including stencil creation) we'd be in >> position to set the nonzero pattern properly. >> >> Do we need to start from a Mat created by CreateMatrix? Or is it ok to >> call DMCreateMatrix (so that the splitting among CPUs and the block size >> are set by PETSc) and then call a MatSetPreallocation routine? > You can call MatXAIJSetPreallocation after. It'll handle all matrix types so you don't have to shepherd data for all the specific preallocations. Hi. Actually I am still struggling with this... Let me explain. My code relies on a SNES and the matrix I need to preallocate is the Jacobian. So I do: in the main file ? ierr = DMCreateMatrix(ctx.daAll,&ctx.J);CHKERRQ(ierr); ? ierr = setJacobianPattern(ctx,ctx.J);CHKERRQ(ierr); //calling MatXAIJSetPreallocation on the second argument ? ierr = MatSetOption(ctx.J,MAT_NEW_NONZERO_LOCATIONS,*******); CHKERRQ(ierr);//allow new nonzeros? ? ierr = SNESSetFunction(snes,ctx.F????? ,FormFunction,(void *) &ctx); CHKERRQ(ierr); ? ierr = SNESSetJacobian(snes,ctx.J,ctx.J,FormJacobian,(void *) &ctx); CHKERRQ(ierr); ? ierr = FormSulfationRHS(ctx, ctx.U0, ctx.RHS);CHKERRQ(ierr); ? ierr = SNESSolve(snes,ctx.RHS,ctx.U); CHKERRQ(ierr); and PetscErrorCode FormJacobian(SNES snes,Vec U,Mat J, Mat P,void *_ctx) does (this is a 2 dof finite difference problem, so logically 2x2 blocks in J) ??? ierr = setJac00(....,P) //calls to MatSetValues in the 00 block ??? ierr = setJac01(....,P) //calls to MatSetValues in the 01 block ??? ierr = setJac1X(....,P) //calls to MatSetValues in the 10 ans 11 block ??? ierr = MatAssemblyBegin(P,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); ? ? ierr = MatAssemblyEnd(P,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); If I run with MAT_NEW_NONZERO_LOCATIONS=TRUE, all runs fine and using the -info option I see that no mallocs are performed during Assembly. Computing F ? 0 SNES Function norm 7.672682917097e+02 Computing J [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 71874 X 71874; storage space: 17661 unneeded,191714 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 71874) < 0.6. Do not use CompressedRow routines. If I omit the call to setJacobianPattern, info reports a nonsero number of mallocs, so somehow the setJacobianPattern routine should be doing its job correctly. However, if I run with MAT_NEW_NONZERO_LOCATIONS=FALSE, the Jacobian is entirely zero and no error messages appear until the KSP tries to do its job: Computing F ? 0 SNES Function norm 7.672682917097e+02 Computing J [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 71874 X 71874; storage space: 209375 unneeded,0 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 71874)/(num_localrows 71874) > 0.6. Use CompressedRow routines. ... and then KSP complains! I have tried adding MAT_FLUSH_ASSEMBLY calls inside the subroutines, but nothing changes. So I have 2 questions: 1. If, as a temporary solution, I leave MAT_NEW_NONZERO_LOCATIONS=TRUE, am I going to incur in performance penalties even if no new nonzeros are created by my assembly routine? 2. Can you guess what is causing the problem? Thanks ??? Matteo From knepley at gmail.com Mon Sep 6 18:34:57 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 6 Sep 2021 19:34:57 -0400 Subject: [petsc-users] Mat preallocation for SNES jacobian [WAS Re: Mat preallocation in case of variable stencils] In-Reply-To: <9f505269-c8d6-0847-cf05-019b36ae1aee@uninsubria.it> References: <87k0k1zl2y.fsf@jedbrown.org> <9f505269-c8d6-0847-cf05-019b36ae1aee@uninsubria.it> Message-ID: On Mon, Sep 6, 2021 at 12:22 PM Matteo Semplice < matteo.semplice at uninsubria.it> wrote: > > Il 31/08/21 17:32, Jed Brown ha scritto: > > Matteo Semplice writes: > > > >> Hi. > >> > >> We are writing a code for a FD scheme on an irregular domain and thus > >> the local stencil is quite variable: we have inner nodes, boundary nodes > >> and inactive nodes, each with their own stencil type and offset with > >> respect to the grid node. We currently create a matrix with > >> DMCreateMatrix on a DMDA and for now have set the option > >> MAT_NEW_NONZERO_LOCATIONS to PETSC_TRUE, but its time to render the code > >> memory-efficient. The layout created automatically is correct for inner > >> nodes, wrong for boundary ones (off-centered stencils) and redundant for > >> outer nodes. > >> > >> After the preprocessing stage (including stencil creation) we'd be in > >> position to set the nonzero pattern properly. > >> > >> Do we need to start from a Mat created by CreateMatrix? Or is it ok to > >> call DMCreateMatrix (so that the splitting among CPUs and the block size > >> are set by PETSc) and then call a MatSetPreallocation routine? > > You can call MatXAIJSetPreallocation after. It'll handle all matrix > types so you don't have to shepherd data for all the specific > preallocations. > > Hi. > > Actually I am still struggling with this... Let me explain. > > My code relies on a SNES and the matrix I need to preallocate is the > Jacobian. So I do: > > in the main file > ierr = DMCreateMatrix(ctx.daAll,&ctx.J);CHKERRQ(ierr); > ierr = setJacobianPattern(ctx,ctx.J);CHKERRQ(ierr); //calling > MatXAIJSetPreallocation on the second argument > I do not understand this. DMCreateMatrix() has already preallocated _and_ filled the matrix with zeros. Additional preallocation statements will not do anything (I think). > ierr = MatSetOption(ctx.J,MAT_NEW_NONZERO_LOCATIONS,*******); > CHKERRQ(ierr);//allow new nonzeros? > ierr = SNESSetFunction(snes,ctx.F ,FormFunction,(void *) &ctx); > CHKERRQ(ierr); > ierr = SNESSetJacobian(snes,ctx.J,ctx.J,FormJacobian,(void *) &ctx); > CHKERRQ(ierr); > > ierr = FormSulfationRHS(ctx, ctx.U0, ctx.RHS);CHKERRQ(ierr); > ierr = SNESSolve(snes,ctx.RHS,ctx.U); CHKERRQ(ierr); > > and > > PetscErrorCode FormJacobian(SNES snes,Vec U,Mat J, Mat P,void *_ctx) > > does (this is a 2 dof finite difference problem, so logically 2x2 blocks > in J) > > ierr = setJac00(....,P) //calls to MatSetValues in the 00 block > > ierr = setJac01(....,P) //calls to MatSetValues in the 01 block > > ierr = setJac1X(....,P) //calls to MatSetValues in the 10 ans 11 block > > ierr = MatAssemblyBegin(P,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(P,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > > If I run with MAT_NEW_NONZERO_LOCATIONS=TRUE, all runs fine and using > the -info option I see that no mallocs are performed during Assembly. > > Computing F > 0 SNES Function norm 7.672682917097e+02 > Computing J > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 71874 X 71874; storage space: > 17661 unneeded,191714 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 71874) < 0.6. Do not use CompressedRow routines. > > If I omit the call to setJacobianPattern, info reports a nonsero number > of mallocs, so somehow the setJacobianPattern routine should be doing > its job correctly. > Hmm, this might mean that the second preallocation call is wiping out the info in the first. Okay, I will go back and look at the code. > However, if I run with MAT_NEW_NONZERO_LOCATIONS=FALSE, the Jacobian is > entirely zero and no error messages appear until the KSP tries to do its > job: > This sounds like your setJacobianPattern() is not filling the matrix with zeros, so that the insertions make new nonzeros. It is hard to make sense of this string of events without the code. > Computing F > 0 SNES Function norm 7.672682917097e+02 > Computing J > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 71874 X 71874; storage space: > 209375 unneeded,0 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 71874)/(num_localrows 71874) > 0.6. Use CompressedRow routines. > ... and then KSP complains! > > I have tried adding MAT_FLUSH_ASSEMBLY calls inside the subroutines, but > nothing changes. > > So I have 2 questions: > > 1. If, as a temporary solution, I leave MAT_NEW_NONZERO_LOCATIONS=TRUE, > am I going to incur in performance penalties even if no new nonzeros are > created by my assembly routine? > If you are worried about performance, I think the option you want is MAT_NEW_NONZERO_ALLOCATION_ERR since allocations, not new nonzeros, are what causes performance problems. Thanks, Matt > 2. Can you guess what is causing the problem? > > Thanks > > Matteo > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Sep 6 20:33:59 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 6 Sep 2021 21:33:59 -0400 Subject: [petsc-users] Slow convergence while parallel computations. In-Reply-To: References: Message-ID: <159AE122-1B25-49DC-93C5-06F5CF198DAD@petsc.dev> You can use -mat_null_space_test to check it the null space you provide is within the null space of the operator. There is no practical way to test if the null space you provide is exactly the full null space of the operator but at least the check ensures that you are not providing something that is not in the null space. Barry Also if you run with GMRES and look at the norms of the residuals -ksp_monitor at each restart iteration (the default restart is 30), if they jump wildly at each restart this can indicate a problem with the nullspace. > On Sep 3, 2021, at 10:48 AM, Viktor Nazdrachev wrote: > > Hello Mark and Matthew! > > I attached log files for serial and parallel cases and corresponding information about GAMG preconditioner (using grep). > > I have to notice, that assembling of global stiffness matrix in code was performed by MatSetValues subrotuine (not MatSetValuesBlocked) > > !nnds ? number of nodes > !dmn=3 > call MatCreate(Petsc_Comm_World,Mat_K,ierr) > call MatSetFromOptions(Mat_K,ierr) > call MatSetSizes(Mat_K,Petsc_Decide,Petsc_Decide,n,n,ierr_m) > ? > call MatMPIAIJSetPreallocation(Mat_K,0,dbw,0,obw,ierr) > ? > call MatSetOption(Mat_K,Mat_New_Nonzero_Allocation_Err,Petsc_False,ierr) > ? > > do i=1,nels > call FormLocalK(i,k,indx,"Kp") ! find local stiffness matrix > indx=indxmap(indx,2) !find global indices for DOFs > call MatSetValues(Mat_K,ef_eldof,indx,ef_eldof,indx,k,Add_Values,ierr) > end do > > But nullspace vector was created using VecSetBlockSize subroutine. > > call VecCreate(Petsc_Comm_World,Vec_NullSpace,ierr) > call VecSetBlockSize(Vec_NullSpace,dmn,ierr) > call VecSetSizes(Vec_NullSpace,nnds*dmn,Petsc_Decide,ierr) > call VecSetUp(Vec_NullSpace,ierr) > call VecGetArrayF90(Vec_NullSpace,null_space,ierr) > ? > call VecRestoreArrayF90(Vec_NullSpace,null_space,ierr) > call MatNullSpaceCreateRigidBody(Vec_NullSpace,matnull,ierr) > call MatSetNearNullSpace(Mat_K,matnull,ierr) > > I suppose it can be one of the reasons of GAMG slow convergence. > So I attached log files for parallel run with ?pure? GAMG precondtioner. > > > Kind regards, > > Viktor Nazdrachev > > R&D senior researcher > > Geosteering Technologies LLC > > ??, 3 ????. 2021 ?. ? 15:11, Matthew Knepley >: > On Fri, Sep 3, 2021 at 8:02 AM Mark Adams > wrote: > > > On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev > wrote: > Hello, Lawrence! > Thank you for your response! > I attached log files (txt files with convergence behavior and RAM usage log in separate txt files) and resulting table with convergence investigation data(xls). Data for main non-regular grid with 500K cells and heterogeneous properties are in 500K folder, whereas data for simple uniform 125K cells grid with constant properties are in 125K folder. > > >> On 1 Sep 2021, at 09:42, ????????? ?????? > wrote: > >> > >> I have a 3D elasticity problem with heterogeneous properties. > > > >What does your coefficient variation look like? How large is the contrast? > > Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to 0.44 and density ? from 1700 to 2600 kg/m^3. > > That is not too bad. Poorly shaped elements are the next thing to worry about. Try to keep the aspect ratio below 10 if possible. > > > > >> There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). > >> > >> The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. > > > >How many iterations do you have in serial (and then in parallel)? > > Serial run is required 112 iterations to reach convergence (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI ? 680 iterations. > > I attached log files for all simulations (txt files with convergence behavior and RAM usage log in separate txt files) and resulting table with convergence/memory usage data(xls). Data for main non-regular grid with 500K cells and heterogeneous properties are in 500K folder, whereas data for simple uniform 125K cells grid with constant properties are in 125K folder. > > > >> I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. > > > >Does the number of iterates increase in parallel? Again, how many iterations do you have? > > For case with 4 MPI processes and attached nullspace it is required 177 iterations to reach convergence (you may see detailed log in log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations are required for sequential run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). > > Again, do not use ICC. I am surprised to see such a large jump in iteration count, but get ICC off the table. > > You will see variability in the iteration count with processor count with GAMG. As much as 10% +-. Maybe more (random) variability , but usually less. > > You can decrease the memory a little, and the setup time a lot, by aggressively coarsening, at the expense of higher iteration counts. It's a balancing act. > > You can run with the defaults, add '-info', grep on GAMG and send the ~30 lines of output if you want advice on parameters. > > Can you send the output of > > -ksp_view -ksp_monitor_true_residual -ksp_converged_reason > > Thanks, > > Matt > > Thanks, > Mark > > > > > >> Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)? > > > >bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. > > Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit integers (It is extremely necessary to perform computation on mesh with more than 10M cells). > > > >If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS ) then you could try the geneo scheme offered by PCHPDDM. > > > I found strange convergence behavior for HPDDM preconditioner. For 1 MPI process BFBCG solver did not converged (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation was successful (1018 to reach convergence, log_hpddm(bfbcg)_pchpddm_4_mpi.txt). > But it should be mentioned that stiffness matrix was created in AIJ format (our default matrix format in program). > Matrix conversion to MATIS format via MatConvert subroutine resulted in losing of convergence for both serial and parallel run. > > >> Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it? > > > >I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. > > Thanks, I`ll try to use a strong threshold only for coarse grids. > > Kind regards, > > Viktor Nazdrachev > > R&D senior researcher > > Geosteering Technologies LLC > > > > > > ??, 1 ????. 2021 ?. ? 12:02, Lawrence Mitchell >: > > > > On 1 Sep 2021, at 09:42, ????????? ?????? > wrote: > > > > I have a 3D elasticity problem with heterogeneous properties. > > What does your coefficient variation look like? How large is the contrast? > > > There is unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are imposed on side faces. Gravity load is also accounted for. The grid I use consists of 500k cells (which is approximately 1.6M of DOFs). > > > > The best performance and memory usage for single MPI process was obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number of iterations required to achieve the same tolerance is significantly increased. > > How many iterations do you have in serial (and then in parallel)? > > > I`ve also tried PCGAMG (agg) preconditioner with IC? (1) sub-precondtioner. For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which appears just before the start of the iterations. Parallel computation with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. > > Does the number of iterates increase in parallel? Again, how many iterations do you have? > > > Are there ways to avoid decreasing of the convergence rate for bjacobi precondtioner in parallel mode? Does it make sense to use hierarchical or nested krylov methods with a local gmres solver (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type bjacobi)? > > bjacobi is only a one-level method, so you would not expect process-independent convergence rate for this kind of problem. If the coefficient variation is not too extreme, then I would expect GAMG (or some other smoothed aggregation package, perhaps -pc_type ml (you need --download-ml)) would work well with some tuning. > > If you have extremely high contrast coefficients you might need something with stronger coarse grids. If you can assemble so-called Neumann matrices (https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS ) then you could try the geneo scheme offered by PCHPDDM. > > > Is this peak memory usage expected for gamg preconditioner? is there any way to reduce it? > > I think that peak memory usage comes from building the coarse grids. Can you run with `-info` and grep for GAMG, this will provide some output that more expert GAMG users can interpret. > > Lawrence > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Sep 6 20:48:25 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 6 Sep 2021 21:48:25 -0400 Subject: [petsc-users] Mat preallocation for SNES jacobian [WAS Re: Mat preallocation in case of variable stencils] In-Reply-To: References: <87k0k1zl2y.fsf@jedbrown.org> <9f505269-c8d6-0847-cf05-019b36ae1aee@uninsubria.it> Message-ID: <9EE66D0B-604F-4683-A42A-D487223DEE0B@petsc.dev> Matteo, I think it might be best if you simply took our "standard" DMCreateMatrix for DMDA routine and modified exactly for your needs. You can find the source code in src/dm/impls/da/fdda.c There are a variety of routines such as DMCreateMatrix_DA_2d_MPIAIJ_Fill() DMCreateMatrix_DA_2d_MPIAIJ() etc. Pick the one that matches your needs, copy it and modify to do the exact preallocation and then filling with zeros for interior points, boundary points etc. I don't think "fixing" the incorrect default behavior after the fact will work. Barry > On Sep 6, 2021, at 7:34 PM, Matthew Knepley wrote: > > On Mon, Sep 6, 2021 at 12:22 PM Matteo Semplice > wrote: > > Il 31/08/21 17:32, Jed Brown ha scritto: > > Matteo Semplice > writes: > > > >> Hi. > >> > >> We are writing a code for a FD scheme on an irregular domain and thus > >> the local stencil is quite variable: we have inner nodes, boundary nodes > >> and inactive nodes, each with their own stencil type and offset with > >> respect to the grid node. We currently create a matrix with > >> DMCreateMatrix on a DMDA and for now have set the option > >> MAT_NEW_NONZERO_LOCATIONS to PETSC_TRUE, but its time to render the code > >> memory-efficient. The layout created automatically is correct for inner > >> nodes, wrong for boundary ones (off-centered stencils) and redundant for > >> outer nodes. > >> > >> After the preprocessing stage (including stencil creation) we'd be in > >> position to set the nonzero pattern properly. > >> > >> Do we need to start from a Mat created by CreateMatrix? Or is it ok to > >> call DMCreateMatrix (so that the splitting among CPUs and the block size > >> are set by PETSc) and then call a MatSetPreallocation routine? > > You can call MatXAIJSetPreallocation after. It'll handle all matrix types so you don't have to shepherd data for all the specific preallocations. > > Hi. > > Actually I am still struggling with this... Let me explain. > > My code relies on a SNES and the matrix I need to preallocate is the > Jacobian. So I do: > > in the main file > ierr = DMCreateMatrix(ctx.daAll,&ctx.J);CHKERRQ(ierr); > ierr = setJacobianPattern(ctx,ctx.J);CHKERRQ(ierr); //calling > MatXAIJSetPreallocation on the second argument > > I do not understand this. DMCreateMatrix() has already preallocated _and_ filled the matrix > with zeros. Additional preallocation statements will not do anything (I think). > > ierr = MatSetOption(ctx.J,MAT_NEW_NONZERO_LOCATIONS,*******); > CHKERRQ(ierr);//allow new nonzeros? > > ierr = SNESSetFunction(snes,ctx.F ,FormFunction,(void *) &ctx); > CHKERRQ(ierr); > ierr = SNESSetJacobian(snes,ctx.J,ctx.J,FormJacobian,(void *) &ctx); > CHKERRQ(ierr); > > ierr = FormSulfationRHS(ctx, ctx.U0, ctx.RHS);CHKERRQ(ierr); > ierr = SNESSolve(snes,ctx.RHS,ctx.U); CHKERRQ(ierr); > > and > > PetscErrorCode FormJacobian(SNES snes,Vec U,Mat J, Mat P,void *_ctx) > > does (this is a 2 dof finite difference problem, so logically 2x2 blocks > in J) > > ierr = setJac00(....,P) //calls to MatSetValues in the 00 block > > ierr = setJac01(....,P) //calls to MatSetValues in the 01 block > > ierr = setJac1X(....,P) //calls to MatSetValues in the 10 ans 11 block > > ierr = MatAssemblyBegin(P,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > ierr = MatAssemblyEnd(P,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); > > If I run with MAT_NEW_NONZERO_LOCATIONS=TRUE, all runs fine and using > the -info option I see that no mallocs are performed during Assembly. > > Computing F > 0 SNES Function norm 7.672682917097e+02 > Computing J > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 71874 X 71874; storage space: > 17661 unneeded,191714 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 71874) < 0.6. Do not use CompressedRow routines. > > If I omit the call to setJacobianPattern, info reports a nonsero number > of mallocs, so somehow the setJacobianPattern routine should be doing > its job correctly. > > Hmm, this might mean that the second preallocation call is wiping out the info in the first. Okay, > I will go back and look at the code. > > However, if I run with MAT_NEW_NONZERO_LOCATIONS=FALSE, the Jacobian is > entirely zero and no error messages appear until the KSP tries to do its > job: > > This sounds like your setJacobianPattern() is not filling the matrix with zeros, so that the insertions > make new nonzeros. It is hard to make sense of this string of events without the code. > > Computing F > 0 SNES Function norm 7.672682917097e+02 > Computing J > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 71874 X 71874; storage space: > 209375 unneeded,0 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 71874)/(num_localrows 71874) > 0.6. Use CompressedRow routines. > ... and then KSP complains! > > I have tried adding MAT_FLUSH_ASSEMBLY calls inside the subroutines, but > nothing changes. > > So I have 2 questions: > > 1. If, as a temporary solution, I leave MAT_NEW_NONZERO_LOCATIONS=TRUE, > am I going to incur in performance penalties even if no new nonzeros are > created by my assembly routine? > > If you are worried about performance, I think the option you want is > > MAT_NEW_NONZERO_ALLOCATION_ERR > > since allocations, not new nonzeros, are what causes performance problems. > > Thanks, > > Matt > > 2. Can you guess what is causing the problem? > > Thanks > > Matteo > > > > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Tue Sep 7 03:01:39 2021 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Tue, 7 Sep 2021 10:01:39 +0200 Subject: [petsc-users] Mat preallocation for SNES jacobian [WAS Re: Mat preallocation in case of variable stencils] In-Reply-To: <9EE66D0B-604F-4683-A42A-D487223DEE0B@petsc.dev> References: <87k0k1zl2y.fsf@jedbrown.org> <9f505269-c8d6-0847-cf05-019b36ae1aee@uninsubria.it> <9EE66D0B-604F-4683-A42A-D487223DEE0B@petsc.dev> Message-ID: Thanks to both of you! @Matthew: ??? Indeed my setJacobianPattern() makes the calls to MatXAIJSetPreallocation, but does not insert zeros in the matrix. ??? I had missed that actual insertions of zeros was needed before calling SNESSolve. @Barry: ??? Good idea: I'll study your DMCreateMatrix routines. Thanks ??? Matteo Il 07/09/21 03:48, Barry Smith ha scritto: > ? Matteo, > > ? ?I think it might be best if you simply took our "standard" > DMCreateMatrix for DMDA routine and modified exactly for your needs. > You can find the source code in src/dm/impls/da/fdda.c There are a > variety of routines such > as?DMCreateMatrix_DA_2d_MPIAIJ_Fill()?DMCreateMatrix_DA_2d_MPIAIJ() etc. > > ? ? Pick the one that matches your needs, copy it and modify to do the > exact preallocation and then filling with zeros for interior points, > boundary points etc. I don't think "fixing" the incorrect default > behavior after the fact will work. > > ? Barry > > >> On Sep 6, 2021, at 7:34 PM, Matthew Knepley > > wrote: >> >> On Mon, Sep 6, 2021 at 12:22 PM Matteo Semplice >> > > wrote: >> >> >> Il 31/08/21 17:32, Jed Brown ha scritto: >> > Matteo Semplice > > writes: >> > >> >> Hi. >> >> >> >> We are writing a code for a FD scheme on an irregular domain >> and thus >> >> the local stencil is quite variable: we have inner nodes, >> boundary nodes >> >> and inactive nodes, each with their own stencil type and >> offset with >> >> respect to the grid node. We currently create a matrix with >> >> DMCreateMatrix on a DMDA and for now have set the option >> >> MAT_NEW_NONZERO_LOCATIONS to PETSC_TRUE, but its time to >> render the code >> >> memory-efficient. The layout created automatically is correct >> for inner >> >> nodes, wrong for boundary ones (off-centered stencils) and >> redundant for >> >> outer nodes. >> >> >> >> After the preprocessing stage (including stencil creation) >> we'd be in >> >> position to set the nonzero pattern properly. >> >> >> >> Do we need to start from a Mat created by CreateMatrix? Or is >> it ok to >> >> call DMCreateMatrix (so that the splitting among CPUs and the >> block size >> >> are set by PETSc) and then call a MatSetPreallocation routine? >> > You can call MatXAIJSetPreallocation after. It'll handle all >> matrix types so you don't have to shepherd data for all the >> specific preallocations. >> >> Hi. >> >> Actually I am still struggling with this... Let me explain. >> >> My code relies on a SNES and the matrix I need to preallocate is the >> Jacobian. So I do: >> >> in the main file >> ?? ierr = DMCreateMatrix(ctx.daAll,&ctx.J);CHKERRQ(ierr); >> ?? ierr = setJacobianPattern(ctx,ctx.J);CHKERRQ(ierr); //calling >> MatXAIJSetPreallocation on the second argument >> >> >> I do not understand this. DMCreateMatrix() has already preallocated >> _and_ filled the matrix >> with zeros. Additional preallocation statements will not do anything >> (I think). >> >> ?? ierr = MatSetOption(ctx.J,MAT_NEW_NONZERO_LOCATIONS,*******); >> CHKERRQ(ierr);//allow new nonzeros? >> >> >> ?? ierr = SNESSetFunction(snes,ctx.F ,FormFunction,(void *) &ctx); >> CHKERRQ(ierr); >> ?? ierr = SNESSetJacobian(snes,ctx.J,ctx.J,FormJacobian,(void *) >> &ctx); >> CHKERRQ(ierr); >> >> ?? ierr = FormSulfationRHS(ctx, ctx.U0, ctx.RHS);CHKERRQ(ierr); >> ?? ierr = SNESSolve(snes,ctx.RHS,ctx.U); CHKERRQ(ierr); >> >> and >> >> PetscErrorCode FormJacobian(SNES snes,Vec U,Mat J, Mat P,void *_ctx) >> >> does (this is a 2 dof finite difference problem, so logically 2x2 >> blocks >> in J) >> >> ???? ierr = setJac00(....,P) //calls to MatSetValues in the 00 block >> >> ???? ierr = setJac01(....,P) //calls to MatSetValues in the 01 block >> >> ???? ierr = setJac1X(....,P) //calls to MatSetValues in the 10 >> ans 11 block >> >> ???? ierr = MatAssemblyBegin(P,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> ?? ? ierr = MatAssemblyEnd(P,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> >> If I run with MAT_NEW_NONZERO_LOCATIONS=TRUE, all runs fine and >> using >> the -info option I see that no mallocs are performed during Assembly. >> >> Computing F >> ?? 0 SNES Function norm 7.672682917097e+02 >> Computing J >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 71874 X 71874; storage >> space: >> 17661 unneeded,191714 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 27 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 71874) < 0.6. Do not use CompressedRow routines. >> >> If I omit the call to setJacobianPattern, info reports a nonsero >> number >> of mallocs, so somehow the setJacobianPattern routine should be >> doing >> its job correctly. >> >> >> Hmm, this might mean?that the second preallocation call is wiping out >> the info in the first. Okay, >> I will go back and look at the code. >> >> However, if I run with MAT_NEW_NONZERO_LOCATIONS=FALSE, the >> Jacobian is >> entirely zero and no error messages appear until the KSP tries to >> do its >> job: >> >> >> This sounds like your setJacobianPattern() is not filling the matrix >> with zeros, so that the insertions >> make new nonzeros. It is hard to make sense of this string of events >> without the code. >> >> Computing F >> ?? 0 SNES Function norm 7.672682917097e+02 >> Computing J >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 71874 X 71874; storage >> space: >> 209375 unneeded,0 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >> MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 0 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 71874)/(num_localrows 71874) > 0.6. Use CompressedRow routines. >> ... and then KSP complains! >> >> I have tried adding MAT_FLUSH_ASSEMBLY calls inside the >> subroutines, but >> nothing changes. >> >> So I have 2 questions: >> >> 1. If, as a temporary solution, I leave >> MAT_NEW_NONZERO_LOCATIONS=TRUE, >> am I going to incur in performance penalties even if no new >> nonzeros are >> created by my assembly routine? >> >> >> If you are worried about performance, I think the option you want is >> >> ??MAT_NEW_NONZERO_ALLOCATION_ERR >> >> since allocations, not new nonzeros, are what causes performance >> problems. >> >> ? Thanks, >> >> ? ? ?Matt >> >> 2. Can you guess what is causing the problem? >> >> Thanks >> >> ???? Matteo >> >> >> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > -- Prof. Matteo Semplice Universit? degli Studi dell?Insubria Dipartimento di Scienza e Alta Tecnologia ? DiSAT Professore Associato Via Valleggio, 11 ? 22100 Como (CO) ? Italia tel.: +39 031 2386316 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsharma4189 at gmail.com Wed Sep 8 05:27:57 2021 From: gsharma4189 at gmail.com (govind sharma) Date: Wed, 8 Sep 2021 15:57:57 +0530 Subject: [petsc-users] petsc_with_c++_python Message-ID: Hi, I need a 2d poisson solver such that I can use petsc objects between C++ and python interactively. I explain this way: -->1 Let's 2D poisson solver written in C++ with petsc -->2 With python notebook interact with solve in parallel with petscpy --> IPython parallel I have seen an example of doing this type by Aron group with C but I find it incomprehensible to understand. Can I get some clarification on this? How do I proceed? Regards, Govind -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 8 08:49:44 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 8 Sep 2021 09:49:44 -0400 Subject: [petsc-users] petsc_with_c++_python In-Reply-To: References: Message-ID: <7A0C8D8E-E0DE-4640-A8AB-25633FA42AA1@petsc.dev> This should be possible, not really different than using C (instead of C++). Please send the example by Aron group and we may have suggestions. Barry > On Sep 8, 2021, at 6:27 AM, govind sharma wrote: > > Hi, > > I need a 2d poisson solver such that I can use petsc objects between C++ and python interactively. I explain this way: > > -->1 Let's 2D poisson solver written in C++ with petsc > -->2 With python notebook interact with solve in parallel with petscpy > --> IPython parallel > > I have seen an example of doing this type by Aron group with C but I find it incomprehensible to understand. > > Can I get some clarification on this? How do I proceed? > > > Regards, > Govind From facklerpw at ornl.gov Wed Sep 8 09:59:20 2021 From: facklerpw at ornl.gov (Fackler, Philip) Date: Wed, 8 Sep 2021 14:59:20 +0000 Subject: [petsc-users] Redirecting petsc output Message-ID: Is there a way to customize how petsc writes information? Instead of writing to stdout (for example: 0 TS dt 0.1 time 0.), what if we want to log that message to a file other output from Xolotl? I'm assuming there are multiple ways of getting this result. What's common practice with petsc folks? Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsharma4189 at gmail.com Wed Sep 8 10:12:02 2021 From: gsharma4189 at gmail.com (govind sharma) Date: Wed, 8 Sep 2021 20:42:02 +0530 Subject: [petsc-users] petsc_with_c++_python In-Reply-To: <7A0C8D8E-E0DE-4640-A8AB-25633FA42AA1@petsc.dev> References: <7A0C8D8E-E0DE-4640-A8AB-25633FA42AA1@petsc.dev> Message-ID: Thanks Smith, https://github.com/pyHPC/pyhpc-tutorial/blob/master/examples/scale/2D%20Cavity%20Flow%20using%20petsc4py.ipynb This is actually a repository. Yes, We can work. Govind On Wed, 8 Sep 2021, 7:19 pm Barry Smith, wrote: > > This should be possible, not really different than using C (instead of > C++). Please send the example by Aron group and we may have suggestions. > > Barry > > > > On Sep 8, 2021, at 6:27 AM, govind sharma wrote: > > > > Hi, > > > > I need a 2d poisson solver such that I can use petsc objects between C++ > and python interactively. I explain this way: > > > > -->1 Let's 2D poisson solver written in C++ with petsc > > -->2 With python notebook interact with solve in parallel with petscpy > > --> IPython parallel > > > > I have seen an example of doing this type by Aron group with C but I > find it incomprehensible to understand. > > > > Can I get some clarification on this? How do I proceed? > > > > > > Regards, > > Govind > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 8 10:16:27 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 8 Sep 2021 11:16:27 -0400 Subject: [petsc-users] Redirecting petsc output In-Reply-To: References: Message-ID: <782F63EE-1821-4607-8D80-77C543E0ACF4@petsc.dev> Philip, There a variety of techniques. Some of the command line options take an optional viewer name where the output can be redirected. For example -ts_monitor ascii:filename or -ts_view ascii:filename see https://petsc.org/release/docs/manualpages/Viewer/PetscOptionsGetViewer.html for more details It is also possible to change all stdout from PETSc to a different file by setting PETSC_STDOUT = fopen(...) Barry > On Sep 8, 2021, at 10:59 AM, Fackler, Philip via petsc-users wrote: > > Is there a way to customize how petsc writes information? Instead of writing to stdout (for example: 0 TS dt 0.1 time 0.), what if we want to log that message to a file other output from Xolotl? I'm assuming there are multiple ways of getting this result. What's common practice with petsc folks? > > Thanks, > > Philip Fackler > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From facklerpw at ornl.gov Wed Sep 8 10:24:27 2021 From: facklerpw at ornl.gov (Fackler, Philip) Date: Wed, 8 Sep 2021 15:24:27 +0000 Subject: [petsc-users] [EXTERNAL] Re: Redirecting petsc output In-Reply-To: <782F63EE-1821-4607-8D80-77C543E0ACF4@petsc.dev> References: <782F63EE-1821-4607-8D80-77C543E0ACF4@petsc.dev> Message-ID: Barry, Thanks for the quick reply! I'll try those out. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Barry Smith Sent: Wednesday, September 8, 2021 11:16 To: Fackler, Philip Cc: xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov Subject: [EXTERNAL] Re: [petsc-users] Redirecting petsc output Philip, There a variety of techniques. Some of the command line options take an optional viewer name where the output can be redirected. For example -ts_monitor ascii:filename or -ts_view ascii:filename see https://petsc.org/release/docs/manualpages/Viewer/PetscOptionsGetViewer.html for more details It is also possible to change all stdout from PETSc to a different file by setting PETSC_STDOUT = fopen(...) Barry On Sep 8, 2021, at 10:59 AM, Fackler, Philip via petsc-users > wrote: Is there a way to customize how petsc writes information? Instead of writing to stdout (for example: 0 TS dt 0.1 time 0.), what if we want to log that message to a file other output from Xolotl? I'm assuming there are multiple ways of getting this result. What's common practice with petsc folks? Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsharma4189 at gmail.com Wed Sep 8 13:38:02 2021 From: gsharma4189 at gmail.com (govind sharma) Date: Thu, 9 Sep 2021 00:08:02 +0530 Subject: [petsc-users] petsc_with_c++_python In-Reply-To: References: <7A0C8D8E-E0DE-4640-A8AB-25633FA42AA1@petsc.dev> Message-ID: Hi Smith, This would be quite okay if we take poisson 2d equation as a starting point and then we can move forward. Govind On Wed, 8 Sep 2021, 8:42 pm govind sharma, wrote: > Thanks Smith, > > > https://github.com/pyHPC/pyhpc-tutorial/blob/master/examples/scale/2D%20Cavity%20Flow%20using%20petsc4py.ipynb > > > This is actually a repository. Yes, We can work. > > > Govind > > On Wed, 8 Sep 2021, 7:19 pm Barry Smith, wrote: > >> >> This should be possible, not really different than using C (instead >> of C++). Please send the example by Aron group and we may have suggestions. >> >> Barry >> >> >> > On Sep 8, 2021, at 6:27 AM, govind sharma >> wrote: >> > >> > Hi, >> > >> > I need a 2d poisson solver such that I can use petsc objects between >> C++ and python interactively. I explain this way: >> > >> > -->1 Let's 2D poisson solver written in C++ with petsc >> > -->2 With python notebook interact with solve in parallel with petscpy >> > --> IPython parallel >> > >> > I have seen an example of doing this type by Aron group with C but I >> find it incomprehensible to understand. >> > >> > Can I get some clarification on this? How do I proceed? >> > >> > >> > Regards, >> > Govind >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 10 05:40:13 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Sep 2021 06:40:13 -0400 Subject: [petsc-users] this is what mangers are for ... Message-ID: Hi Sherry, Here is a request that you do not see every day. Can you please verify that the back of your badge says this for me? I need a badge for access to Fugaku and my badge is old and unreadable. They apparently grabbed this text from some other LBNL badge and want to verify. Thanks, Mark "EMERGENCY STATUS ANNOUNCEMENT IF FOUND,DROP IN ANY MAILBOX,POSTMASTER,POSTAGE GUARANTEED. RETURN TO: LAWRENCE BERKELEY NATIONAL LABORATORY ONE CYCLOTRON ROAD BERKELEY, CALIFORNIA 94720 THIS CREDENTIAL IS THE PROPERTY OF THE U.S. GOVERNMENT. THE COUNTERFEIT, ALTERATION, OR MISUSE IS A VIOLATION OF SECTION 499 AND 701, TITLE 18,UNITED STATES CODE. OPERATED UNDER CONTRACT NO. DE-AC03-76SF00098 WITH THE U.S.DEPARTMENT OF ENERGY" -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Sep 10 07:23:47 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 10 Sep 2021 08:23:47 -0400 Subject: [petsc-users] this is what mangers are for ... In-Reply-To: References: Message-ID: Whoops sorry, wrong email. On Fri, Sep 10, 2021 at 6:40 AM Mark Adams wrote: > Hi Sherry, > Here is a request that you do not see every day. > Can you please verify that the back of your badge says this for me? > I need a badge for access to Fugaku and my badge is old and unreadable. > They apparently grabbed this text from some other LBNL badge and want to > verify. > Thanks, > Mark > > "EMERGENCY STATUS ANNOUNCEMENT > IF FOUND,DROP IN ANY MAILBOX,POSTMASTER,POSTAGE GUARANTEED. RETURN TO: > LAWRENCE BERKELEY NATIONAL LABORATORY ONE CYCLOTRON ROAD BERKELEY, > CALIFORNIA 94720 > THIS CREDENTIAL IS THE PROPERTY OF THE U.S. GOVERNMENT. THE COUNTERFEIT, > ALTERATION, OR MISUSE IS A VIOLATION OF SECTION 499 AND 701, TITLE > 18,UNITED STATES CODE. > OPERATED UNDER CONTRACT NO. DE-AC03-76SF00098 WITH THE U.S.DEPARTMENT OF > ENERGY" > -------------- next part -------------- An HTML attachment was scrubbed... URL: From badi.hamid at gmail.com Fri Sep 10 10:55:32 2021 From: badi.hamid at gmail.com (Hamid) Date: Fri, 10 Sep 2021 17:55:32 +0200 Subject: [petsc-users] Petsc with Visual Studio Message-ID: <9844C996-71D9-4B45-A30B-0013BE78F2B5@hxcore.ol> An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Sep 10 11:02:22 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 10 Sep 2021 11:02:22 -0500 (CDT) Subject: [petsc-users] Petsc with Visual Studio In-Reply-To: <9844C996-71D9-4B45-A30B-0013BE78F2B5@hxcore.ol> References: <9844C996-71D9-4B45-A30B-0013BE78F2B5@hxcore.ol> Message-ID: At some point we had some un-maintainable cmake code [that didn't work well on windows anyway and also relied on cygwin build env] - and it was removed. build of PETSc with either Intel or MS compiler is the same process [using cygwin env] Satish On Fri, 10 Sep 2021, Hamid wrote: > > Hi everybody, > > ? > > I already compiled Petsc (without MUMPS) with Intel compilers under cygwin env, it was a real pain. > > I recently tried to compile petsc with Visual using the native compiler. > > First of all, i compiled?METIS, MUMPS (with Metis only cause Scotch compilation is tricky for me) and OpenBLAS. > > When it ?comes to Petsc, the compilation process is quite hard, the configuration stage does a lot of things? Do you possible to cmake the project?? > > Is there someone who already did stuff in that way?? > > ? > > ? > > Best regards. > > ? > > Envoy? ? partir de Courrier pour Windows > > ? > > > [icon-envelope-tick-round-orange-animated-no-repeat-v1.gif] > Garanti sans virus. www.avast.com > > From bsmith at petsc.dev Fri Sep 10 11:50:03 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 10 Sep 2021 12:50:03 -0400 Subject: [petsc-users] Ainsworth formula to solve saddle point problems / preconditioner for shell matrices In-Reply-To: References: <61b8dbda-c2c4-d834-9ef9-e12c5254fb31@cea.fr> <87mu15u6kx.fsf@jedbrown.org> <5504dd4c-1846-7652-a0d2-3dc955ab20df@cea.fr> <886ADC82-ED26-448E-8B3B-5EE483AEC58F@petsc.dev> <358AC9C4-8D8E-40EE-845D-0B124D03060D@petsc.dev> <7b2d0bd6-b31b-42ff-f9fc-fb359a59549f@cea.fr> <87tuv48osv.fsf@jedbrown.org> <2B8B302F-D823-4160-B674-B3DAE78E6363@petsc.dev> <218E7696-2A50-42A3-8CF2-D58FCC17B855@petsc.dev> <15e6d4f0-8678-d43a-22d9-9818c51072e3@cea.fr> <12d9be6f-1aab-5773-f73d-a00f9106be45@cea.fr> <50af8386-6ccb-b6d4-ea2d-6d0ee68bc3f7@cea.fr> <122027EE-C4EC-4DCC-832A-ED0FBD092B30@petsc.dev> Message-ID: <6E293530-C42C-4788-876D-FE88B973514C@petsc.dev> Olivier, I believe the code is ready for your use. MatInvertVariableBlockEnvelope() will take an MPIAIJ matrix (this would be your C*C' matrix, determine the block diagonal envelope automatically and construct a new MPIAIJ matrix that is the inverse of the block diagonal envelope (by inverting each of the blocks). For small blocks, as is your case, the inverse should inexpensive. I have a test code src/mat/tests/ex178.c that demonstrates its use and tests if for some random blocking across multiple processes. Please let me know if you have any difficulties. The branch is the same name as before barry/2020-10-08/invert-block-diagonal-aij but it has been rebased so you will need to delete your local copy of the branch and then recreate it after git fetch. Barry > On Sep 8, 2021, at 3:09 PM, Olivier Jamond wrote: > > Ok thanks a lot! I look forward to hearing from you! > > Best regards, > Olivier > > On 08/09/2021 20:56, Barry Smith wrote: >> >> Olivier, >> >> I'll refresh my memory on this and see if I can finish it up. >> >> Barry >> >> >>> On Sep 2, 2021, at 12:38 PM, Olivier Jamond > wrote: >>> >>> Dear Barry, >>> >>> First I hope that you and your relatives are doing well in these troubled times... >>> >>> I allow myself to come back to you about the subject of being able to compute something like '(C*Ct)^(-1)*C', where 'C' is a 'MPC' matrix that is used to impose some boundary conditions for a structural finite element problem: >>> >>> [K C^t][U]=[F] >>> [C 0 ][L] [D] >>> >>> as we discussed some time ago, I would like to solve such a problem using the Ainsworth method, which involves this '(C*Ct)^(-1)*C'. >>> >>> You kindly started some developments to help me on that, which worked as a 'proof of concept' in sequential, but not yet in parallel, and also kindly suggested that you could extend it to the parallel case (MR: https://gitlab.com/petsc/petsc/-/merge_requests/3544 ). Can this be still 'scheduled' on your side? >>> >>> Sorry again to "harass" you about that... >>> >>> Best regards, >>> Olivier Jamond >>> >>> On 03/02/2021 08:45, Olivier Jamond wrote: >>>> Dear Barry, >>>> >>>> I come back to you about this topic. As I wrote last time, this is not a "highly urgent" subject (whereas we will have to deal with it in the next months), but it is an important one (especially since the code I am working on just raised significantly its ambitions). So I just would like to check with you that this is still 'scheduled' on your side. >>>> >>>> I am sorry, I feel a little embarrassed about asking you again about your work schedule, but I need some kind of 'visibility' about this topic which will become critical for our application. >>>> >>>> Many thanks for helping me on that! >>>> Olivier >>>> >>>> On 02/12/2020 21:34, Barry Smith wrote: >>>>> >>>>> Sorry I have not gotten back to you quicker, give me a few days to see how viable it is. >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On Nov 25, 2020, at 11:57 AM, Olivier Jamond > wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> I come back to you about the feature to unlock the Ainsworth method for saddle point problems in parallel. If I may ask (sorry for that...), is it still on your schedule (I just checked the branch, and it seems 'stuck')? >>>>>> >>>>>> This is not "highly urgent" on my side, but the ability to handle efficiently saddle point problems with iterative solvers will be a critical point for the software I am working on... >>>>>> >>>>>> Many thanks (and sorry again for asking about your work schedule...)! >>>>>> Olivier >>>>>> >>>>>> On 12/10/2020 16:49, Barry Smith wrote: >>>>>>> >>>>>>> >>>>>>>> On Oct 12, 2020, at 6:10 AM, Olivier Jamond > wrote: >>>>>>>> >>>>>>>> Hi Barry, >>>>>>>> >>>>>>>> Thanks for this work! I tried this branch with my code and sequential matrices on a small case: it does work! >>>>>>>> >>>>>>>> >>>>>>> Excellant. I will extend it to the parallel case and get it into our master release. >>>>>>> >>>>>>> We'd be interested in hearing about your convergence and timing experiences when you run largish jobs (even sequentially) since this type of problem comes up relatively frequently and we do need a variety of solvers that can handle it while currently we do not have great tools for it. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>>> Thanks a lot, >>>>>>>> Olivier >>>>>>>> >>>>>>>> On 09/10/2020 03:50, Barry Smith wrote: >>>>>>>>> >>>>>>>>> Olivier, >>>>>>>>> >>>>>>>>> The branch barry/2020-10-08/invert-block-diagonal-aij contains an example src/mat/tests/ex178.c that shows how to compute inv(CC'). It works for SeqAIJ matrices. >>>>>>>>> >>>>>>>>> Please let us know if it works for you and then I will implement the parallel version. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Oct 8, 2020, at 3:59 PM, Barry Smith > wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Olivier >>>>>>>>>> >>>>>>>>>> I am working on extending the routines now and hopefully push a branch you can try fairly soon. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Oct 8, 2020, at 3:07 PM, Jed Brown > wrote: >>>>>>>>>>> >>>>>>>>>>> Olivier Jamond > writes: >>>>>>>>>>> >>>>>>>>>>>>> Given the structure of C it seems you should just explicitly construct Sp and use GAMG (or other preconditioners, even a direct solver) directly on Sp. Trying to avoid explicitly forming Sp will give you a much slower performing solving for what benefit? If C was just some generic monster than forming Sp might be unrealistic but in your case CCt is is block diagonal with tiny blocks which means (C*Ct)^(-1) is block diagonal with tiny blocks (the blocks are the inverses of the blocks of (C*Ct)). >>>>>>>>>>>>> >>>>>>>>>>>>> Sp = Ct*C + Qt * S * Q = Ct*C + [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] >>>>>>>>>>>>> >>>>>>>>>>>>> [Ct * (C*Ct)^(-1)*C] will again be block diagonal with slightly larger blocks. >>>>>>>>>>>>> >>>>>>>>>>>>> You can do D = (C*Ct) with MatMatMult() then write custom code that zips through the diagonal blocks of D inverting all of them to get iD then use MatPtAP applied to C and iD to get Ct * (C*Ct)^(-1)*C then MatShift() to include the I then MatPtAP or MatRAR to get [I - Ct * (C*Ct)^(-1)*C] S [I - Ct * (C*Ct)^(-1)*C] then finally MatAXPY() to get Sp. The complexity of each of the Mat operations is very low because of the absurdly simple structure of C and its descendants. You might even be able to just use MUMPS to give you the explicit inv(C*Ct) without writing custom code to get iD. >>>>>>>>>>>> >>>>>>>>>>>> At this time, I didn't manage to compute iD=inv(C*Ct) without using >>>>>>>>>>>> dense matrices, what may be a shame because all matrices are sparse . Is >>>>>>>>>>>> it possible? >>>>>>>>>>>> >>>>>>>>>>>> And I get no idea of how to write code to manually zip through the >>>>>>>>>>>> diagonal blocks of D to invert them... >>>>>>>>>>> >>>>>>>>>>> You could use MatInvertVariableBlockDiagonal(), which should perhaps return a Mat instead of a raw array. >>>>>>>>>>> >>>>>>>>>>> If you have constant block sizes, MatInvertBlockDiagonalMat will return a Mat. >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From aduarteg at utexas.edu Fri Sep 10 11:51:57 2021 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Fri, 10 Sep 2021 11:51:57 -0500 Subject: [petsc-users] DMDA matrices with one sided stencils Message-ID: Good afternoon, I have developed and validated some matrix operators using petsc with a structured dmda. Some of these operators use one-sided stencils at the boundaries, which following the way the dmda uses the stencil width value, requires me to increase the stencil width to accommodate more entries at the boundary only if I want to avoid errors with default options. This is very wasteful and affects my performance, since there are a lot of extra zeros corresponding to the inner points. What is the best way to improve this? I have read in some public threads the possibility of using MatOption to allow us to put more entries into the matrix, but that does not allow me to use MatSetStencil? Alternatively, is there any way to use a larger stencil width and then trim the zero entries that were entered automatically? If there are any other solutions for this problem, please let me know. Thank you, -Alfredo Duarte -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 10 12:52:29 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 10 Sep 2021 13:52:29 -0400 Subject: [petsc-users] Petsc with Visual Studio In-Reply-To: <9844C996-71D9-4B45-A30B-0013BE78F2B5@hxcore.ol> References: <9844C996-71D9-4B45-A30B-0013BE78F2B5@hxcore.ol> Message-ID: Please let us know of any difficulties that arose, we may be able to improve the process or the documentation of make the process less painful. Barry > On Sep 10, 2021, at 11:55 AM, Hamid wrote: > > Hi everybody, > > I already compiled Petsc (without MUMPS) with Intel compilers under cygwin env, it was a real pain. > I recently tried to compile petsc with Visual using the native compiler. > First of all, i compiled METIS, MUMPS (with Metis only cause Scotch compilation is tricky for me) and OpenBLAS. > When it comes to Petsc, the compilation process is quite hard, the configuration stage does a lot of things? Do you possible to cmake the project ? > Is there someone who already did stuff in that way ? > > > Best regards. > > Envoy? ? partir de Courrier pour Windows > > > Garanti sans virus. www.avast.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 10 13:04:17 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 10 Sep 2021 14:04:17 -0400 Subject: [petsc-users] DMDA matrices with one sided stencils In-Reply-To: References: Message-ID: I think the following should work for you. Create a "wide" DMDA and then call DMSetMatrixPreallocateOnly() on it, use this DMDA to create your matrix, this will ensure that only the entries you enter into the matrix are stored (so the extra "layers" of zeros will not appear in the matrix). The matrix vector products will then not use those extra entries and will be faster. Destroy the no longer needed wide DMDA. You can use MatSetValuesStencil() with this matrix. Now create your regular DMDA and use that to create your vectors and for needed DMGlobalToLocal etc. Barry > On Sep 10, 2021, at 12:51 PM, Alfredo J Duarte Gomez wrote: > > Good afternoon, > > I have developed and validated some matrix operators using petsc with a structured dmda. > > Some of these operators use one-sided stencils at the boundaries, which following the way the dmda uses the stencil width value, requires me to increase the stencil width to accommodate more entries at the boundary only if I want to avoid errors with default options. > > This is very wasteful and affects my performance, since there are a lot of extra zeros corresponding to the inner points. > > What is the best way to improve this? > > I have read in some public threads the possibility of using MatOption to allow us to put more entries into the matrix, but that does not allow me to use MatSetStencil? > > Alternatively, is there any way to use a larger stencil width and then trim the zero entries that were entered automatically? > > If there are any other solutions for this problem, please let me know. > > Thank you, > > -Alfredo Duarte > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerhard.ungersback at uis.no Fri Sep 10 13:23:51 2021 From: gerhard.ungersback at uis.no (=?iso-8859-1?Q?Gerhard_Ungersb=E4ck?=) Date: Fri, 10 Sep 2021 18:23:51 +0000 Subject: [petsc-users] Combine DM and FFTW Message-ID: Hello, I want to solve a time dependent differential equation in 3D (Scalar field theory in Hamilton formulation) . The crucial part is that at some time steps I need to FFT the 3D grid. I have written a sequential code without petsc and now I would like to use petsc to get a parallel version. I worked through the examples and now I understand DMDACreate and also the FFTW examples. What is missing though is how I combine both! DMDACreate takes care of ghost points and periodic boundary conditions. As far as I know FFTW requires each process to have a slab of the grid to work. I know how to create this grid with DMDACreate. Normally I would proceed by creating a global vector by DMCreateGlobalVector. But this vector needs then to be linked with fftw arrays. How does this work? Or should I first allocate local memory for fftw and then somehow stitch them together to form a global petsc vector? Thanks best, gerhard -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Sep 11 13:49:37 2021 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 11 Sep 2021 14:49:37 -0400 Subject: [petsc-users] Combine DM and FFTW In-Reply-To: References: Message-ID: On Fri, Sep 10, 2021 at 2:26 PM Gerhard Ungersb?ck via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, > > > I want to solve a time dependent differential equation in 3D (Scalar field > theory in Hamilton formulation) . > > The crucial part is that at some time steps I need to FFT the 3D grid. > > > I have written a sequential code without petsc and now I would like to use > petsc to get a parallel version. > > I worked through the examples and now I understand DMDACreate and also the > FFTW examples. > > > What is missing though is how I combine both! > > DMDACreate takes care of ghost points and periodic boundary conditions. As > far as I know FFTW requires each process to have a slab of the grid to > work. I know how to create this grid with DMDACreate. Normally I would > proceed by creating a global vector by DMCreateGlobalVector. But this > vector needs then to be linked with fftw arrays. How does this work? > > Or should I first allocate local memory for fftw and then somehow stitch > them together to form a global petsc vector? > It would be great if we had worked through an example with this, but I can only find a serial example: https://gitlab.com/petsc/petsc/-/blob/main/src/dm/tests/ex27.c I think you can use https://petsc.org/main/docs/manualpages/Mat/VecScatterFFTWToPetsc.html to go between FFTW and Petsc vectors, but I have not tried in parallel. Is the example in the right direction? Thanks, Matt > Thanks > > best, gerhard > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Sep 11 13:55:49 2021 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 11 Sep 2021 14:55:49 -0400 Subject: [petsc-users] DMDA matrices with one sided stencils In-Reply-To: References: Message-ID: On Fri, Sep 10, 2021 at 2:04 PM Barry Smith wrote: > > I think the following should work for you. > > Create a "wide" DMDA and then call DMSetMatrixPreallocateOnly() > Or use -dm_preallocate_only Thanks, Matt > on it, use this DMDA to create your matrix, this will ensure that only the > entries you enter into the matrix are stored (so the extra "layers" of > zeros will not appear in the matrix). The matrix vector products will then > not use those extra entries and will be faster. Destroy the no longer > needed wide DMDA. You can use MatSetValuesStencil() with this matrix. > > Now create your regular DMDA and use that to create your vectors and > for needed DMGlobalToLocal etc. > > Barry > > > On Sep 10, 2021, at 12:51 PM, Alfredo J Duarte Gomez > wrote: > > Good afternoon, > > I have developed and validated some matrix operators using petsc with a > structured dmda. > > Some of these operators use one-sided stencils at the boundaries, which > following the way the dmda uses the stencil width value, requires me to > increase the stencil width to accommodate more entries at the boundary only > if I want to avoid errors with default options. > > This is very wasteful and affects my performance, since there are a lot of > extra zeros corresponding to the inner points. > > What is the best way to improve this? > > I have read in some public threads the possibility of using MatOption to > allow us to put more entries into the matrix, but that does not allow me to > use MatSetStencil? > > Alternatively, is there any way to use a larger stencil width and then > trim the zero entries that were entered automatically? > > If there are any other solutions for this problem, please let me know. > > Thank you, > > -Alfredo Duarte > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun Sep 12 14:51:11 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 12 Sep 2021 15:51:11 -0400 Subject: [petsc-users] p4est error at NERSC Message-ID: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1351643 bytes Desc: not available URL: From bsmith at petsc.dev Sun Sep 12 19:27:56 2021 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 12 Sep 2021 20:27:56 -0400 Subject: [petsc-users] p4est error at NERSC In-Reply-To: References: Message-ID: <39EE145C-FA06-49B4-9AEE-3E2358961516@petsc.dev> You seem to have too many inconsistent modules loaded at the same time. It is picking up /usr/common/software/sles15_cgpu/hpcsdk/20.11/Linux_x86_64/20.11/compilers/lib/libgomp.so when it started with /usr/common/software/sles15_cgpu/gcc/8.3.0/lib/../lib64/libgomp.so Since you are using --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc you should not load the hpcsdk/20.11 module which contains the PGI compilers. Barry > On Sep 12, 2021, at 3:51 PM, Mark Adams wrote: > > > From stefano.zampini at gmail.com Mon Sep 13 03:02:40 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 13 Sep 2021 11:02:40 +0300 Subject: [petsc-users] PETSC installation on Cray In-Reply-To: References: <4fd4fc04-9c46-51f5-2720-c682d0011224@dkrz.de> Message-ID: Enrico I have accidentally stepped on the same issue. You may want to check if it works with this branch https://gitlab.com/petsc/petsc/-/tree/stefanozampini/cray-arm Il giorno mar 2 mar 2021 alle ore 23:03 Barry Smith ha scritto: > > Please try the following. Make four files as below then compile each > with cc -c -o test.o test1.c again for test2.c etc > > Send all the output. > > > > test1.c > #include > > test2.c > #define _BSD_SOURCE > #include > > test3.c > #define _DEFAULT_SOURCE > #include > > test4.c > #define _GNU_SOURCE > #include > > > On Mar 2, 2021, at 7:33 AM, Enrico wrote: > > > > Hi, > > > > attached is the configuration and make log files. > > > > Enrico > > > > On 02/03/2021 14:13, Matthew Knepley wrote: > >> On Tue, Mar 2, 2021 at 7:49 AM Enrico degregori at dkrz.de>> wrote: > >> Hi, > >> I'm having some problems installing PETSC with Cray compiler. > >> I use this configuration: > >> ./configure --with-cc=cc --with-cxx=CC --with-fc=0 --with-debugging=1 > >> --with-shared-libraries=1 COPTFLAGS=-O0 CXXOPTFLAGS=-O0 > >> and when I do > >> make all > >> I get the following error because of cmathcalls.h: > >> CC-1043 craycc: ERROR File = /usr/include/bits/cmathcalls.h, Line = > 55 > >> _Complex can only be used with floating-point types. > >> __MATHCALL (cacos, (_Mdouble_complex_ __z)); > >> ^ > >> Am I doing something wrong? > >> This was expended from somewhere. Can you show the entire err log? > >> Thanks, > >> Matt > >> Regards, > >> Enrico Degregori > >> -- > >> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >> -- Norbert Wiener > >> https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From badi.hamid at gmail.com Mon Sep 13 06:18:05 2021 From: badi.hamid at gmail.com (Hamid) Date: Mon, 13 Sep 2021 13:18:05 +0200 Subject: [petsc-users] Configure slowness Message-ID: An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Sep 13 07:12:12 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 13 Sep 2021 08:12:12 -0400 Subject: [petsc-users] Configure slowness In-Reply-To: References: Message-ID: On Mon, Sep 13, 2021 at 7:18 AM Hamid wrote: > Hi, > > > > Do you know why configure is very very slow using cygwin/python ? It takes > more that 10min to finalize. > We believe this is due to a very slow filesystem. Almost all the time in configure is filesystem operations. NTFS is just very slow, and this probably accounts for the difference with the Linux runtime. One option is to build using WSL Thanks, Matt > > > Best regards. > > > > Envoy? ? partir de Courrier > pour Windows > > > > > Garanti > sans virus. www.avast.com > > <#m_7906049641795168604_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Sep 13 08:20:10 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 13 Sep 2021 08:20:10 -0500 (CDT) Subject: [petsc-users] Configure slowness In-Reply-To: References: Message-ID: <50fd3646-715-e395-9813-cd405ea74786@mcs.anl.gov> cygwin is slow. And configure does 100s of compiles and other sequential steps [and we use a win32fe compile wrapper that adds extra I/O,run-time cost for each compile] Satish On Mon, 13 Sep 2021, Matthew Knepley wrote: > On Mon, Sep 13, 2021 at 7:18 AM Hamid wrote: > > > Hi, > > > > > > > > Do you know why configure is very very slow using cygwin/python ? It takes > > more that 10min to finalize. > > > > We believe this is due to a very slow filesystem. Almost all the time in > configure is filesystem operations. > NTFS is just very slow, and this probably accounts for the difference with > the Linux runtime. > > One option is to build using WSL > > Thanks, > > Matt > > > > > > > > Best regards. > > > > > > > > Envoy? ? partir de Courrier > > pour Windows > > > > > > > > > > Garanti > > sans virus. www.avast.com > > > > <#m_7906049641795168604_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > > > > From aph at email.arizona.edu Mon Sep 13 14:34:40 2021 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Mon, 13 Sep 2021 12:34:40 -0700 Subject: [petsc-users] MatZeroRows and full assembly Message-ID: Hello, Is it allowed after a MatZeroRows to insert more values in the row that was just zeroed with MatSetValues and then perform another full assembly of the matrix? Thanks, Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Sep 13 15:01:55 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 13 Sep 2021 16:01:55 -0400 Subject: [petsc-users] MatZeroRows and full assembly In-Reply-To: References: Message-ID: > On Sep 13, 2021, at 3:34 PM, Anthony Paul Haas wrote: > > Hello, > > Is it allowed after a MatZeroRows to insert more values in the row that was just zeroed with MatSetValues and then perform another full assembly of the matrix? Yes, if you are replacing previously zeroed values it will simply fill them in efficiently. If you are introducing new nonzero locations it will be inefficient in general because it has to allocate new space for the new locations. Barry > > Thanks, > > Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Sep 13 15:04:04 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 13 Sep 2021 15:04:04 -0500 Subject: [petsc-users] MatZeroRows and full assembly In-Reply-To: References: Message-ID: >From https://petsc.org/release/docs/manualpages/Mat/MatSetOption.html, MAT_KEEP_NONZERO_PATTERN indicates when MatZeroRows () is called the zeroed entries are kept in the nonzero structure So, if you have this option true and you set to a previous location, then it is fine, otherwise you also need MAT_NEW_NONZERO_ALLOCATION_ERR to be false to do so. --Junchao Zhang On Mon, Sep 13, 2021 at 2:34 PM Anthony Paul Haas wrote: > Hello, > > Is it allowed after a MatZeroRows to insert more values in the row that > was just zeroed with MatSetValues and then perform another full assembly of > the matrix? > > Thanks, > > Anthony > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Sep 13 15:25:33 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 13 Sep 2021 16:25:33 -0400 Subject: [petsc-users] MatZeroRows and full assembly In-Reply-To: References: Message-ID: Sorry, my mistake. It is MatZeroRowsColumns() that ignores the MAT_KEEP_NONZERO_PATTERN option. Barry > On Sep 13, 2021, at 4:04 PM, Junchao Zhang wrote: > > From https://petsc.org/release/docs/manualpages/Mat/MatSetOption.html , > > MAT_KEEP_NONZERO_PATTERN indicates when MatZeroRows () is called the zeroed entries are kept in the nonzero structure > > So, if you have this option true and you set to a previous location, then it is fine, otherwise you also need MAT_NEW_NONZERO_ALLOCATION_ERR to be false to do so. > > --Junchao Zhang > > > On Mon, Sep 13, 2021 at 2:34 PM Anthony Paul Haas > wrote: > Hello, > > Is it allowed after a MatZeroRows to insert more values in the row that was just zeroed with MatSetValues and then perform another full assembly of the matrix? > > Thanks, > > Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From berend.vanwachem at ovgu.de Tue Sep 14 04:14:48 2021 From: berend.vanwachem at ovgu.de (Berend van Wachem) Date: Tue, 14 Sep 2021 11:14:48 +0200 Subject: [petsc-users] DMView and DMLoad Message-ID: <56ce2135-9757-4292-e33b-c7eea8fb7b2e@ovgu.de> Dear PETSc-team, We are trying to save and load distributed DMPlex and its associated physical fields (created with DMCreateGlobalVector) (Uvelocity, VVelocity, ...) in HDF5_XDMF format. To achieve this, we do the following: 1) save in the same xdmf.h5 file: DMView( DM , H5_XDMF_Viewer ); VecView( UVelocity, H5_XDMF_Viewer ); 2) load the dm: DMPlexCreateFromfile(PETSC_COMM_WORLD, Filename, PETSC_TRUE, DM); 3) load the physical field: VecLoad( UVelocity, H5_XDMF_Viewer ); There are no errors in the execution, but the loaded DM is distributed differently to the original one, which results in the incorrect placement of the values of the physical fields (UVelocity etc.) in the domain. This approach is used to restart the simulation with the last saved DM. Is there something we are missing, or there exists alternative routes to this goal? Can we somehow get the IS of the redistribution, so we can re-distribute the vector data as well? Many thanks, best regards, Berend. From knepley at gmail.com Tue Sep 14 06:23:49 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 14 Sep 2021 07:23:49 -0400 Subject: [petsc-users] DMView and DMLoad In-Reply-To: <56ce2135-9757-4292-e33b-c7eea8fb7b2e@ovgu.de> References: <56ce2135-9757-4292-e33b-c7eea8fb7b2e@ovgu.de> Message-ID: On Tue, Sep 14, 2021 at 5:15 AM Berend van Wachem wrote: > Dear PETSc-team, > > We are trying to save and load distributed DMPlex and its associated > physical fields (created with DMCreateGlobalVector) (Uvelocity, > VVelocity, ...) in HDF5_XDMF format. To achieve this, we do the following: > > 1) save in the same xdmf.h5 file: > DMView( DM , H5_XDMF_Viewer ); > VecView( UVelocity, H5_XDMF_Viewer ); > > 2) load the dm: > DMPlexCreateFromfile(PETSC_COMM_WORLD, Filename, PETSC_TRUE, DM); > > 3) load the physical field: > VecLoad( UVelocity, H5_XDMF_Viewer ); > > There are no errors in the execution, but the loaded DM is distributed > differently to the original one, which results in the incorrect > placement of the values of the physical fields (UVelocity etc.) in the > domain. > > This approach is used to restart the simulation with the last saved DM. > Is there something we are missing, or there exists alternative routes to > this goal? Can we somehow get the IS of the redistribution, so we can > re-distribute the vector data as well? > > Many thanks, best regards, > Hi Berend, We are in the midst of rewriting this. We want to support saving multiple meshes, with fields attached to each, and preserving the discretization (section) information, and allowing us to load up on a different number of processes. We plan to be done by October. Vaclav and I are doing this in collaboration with Koki Sagiyama, David Ham, and Lawrence Mitchell from the Firedrake team. For this problem, we need to give hints for the distribution when you load the DM, as is done now with Vec. We have replaced the DMPlexCreateFromFile() with DMLoad() to better match the interface in the rest of PETSc. Hopefully the wait is not too big an inconvenience. Thanks, Matt > Berend. > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From facklerpw at ornl.gov Wed Sep 15 11:52:26 2021 From: facklerpw at ornl.gov (Fackler, Philip) Date: Wed, 15 Sep 2021 16:52:26 +0000 Subject: [petsc-users] [EXTERNAL] Re: Redirecting petsc output In-Reply-To: References: <782F63EE-1821-4607-8D80-77C543E0ACF4@petsc.dev> Message-ID: Just to follow up here, I figured out a different way to get what I really wanted, and that is described on this page: https://petsc.org/release/docs/manualpages/Sys/PetscVFPrintf.html Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: petsc-users on behalf of Fackler, Philip via petsc-users Sent: Wednesday, September 8, 2021 11:24 To: Barry Smith Cc: petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Subject: Re: [petsc-users] [EXTERNAL] Re: Redirecting petsc output Barry, Thanks for the quick reply! I'll try those out. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory ________________________________ From: Barry Smith Sent: Wednesday, September 8, 2021 11:16 To: Fackler, Philip Cc: xolotl-psi-development at lists.sourceforge.net ; petsc-users at mcs.anl.gov Subject: [EXTERNAL] Re: [petsc-users] Redirecting petsc output Philip, There a variety of techniques. Some of the command line options take an optional viewer name where the output can be redirected. For example -ts_monitor ascii:filename or -ts_view ascii:filename see https://petsc.org/release/docs/manualpages/Viewer/PetscOptionsGetViewer.html for more details It is also possible to change all stdout from PETSc to a different file by setting PETSC_STDOUT = fopen(...) Barry On Sep 8, 2021, at 10:59 AM, Fackler, Philip via petsc-users > wrote: Is there a way to customize how petsc writes information? Instead of writing to stdout (for example: 0 TS dt 0.1 time 0.), what if we want to log that message to a file other output from Xolotl? I'm assuming there are multiple ways of getting this result. What's common practice with petsc folks? Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Fri Sep 17 08:46:58 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Fri, 17 Sep 2021 14:46:58 +0100 Subject: [petsc-users] DMView and DMLoad In-Reply-To: References: <56ce2135-9757-4292-e33b-c7eea8fb7b2e@ovgu.de> Message-ID: Hi Berend, > On 14 Sep 2021, at 12:23, Matthew Knepley wrote: > > On Tue, Sep 14, 2021 at 5:15 AM Berend van Wachem wrote: > Dear PETSc-team, > > We are trying to save and load distributed DMPlex and its associated > physical fields (created with DMCreateGlobalVector) (Uvelocity, > VVelocity, ...) in HDF5_XDMF format. To achieve this, we do the following: > > 1) save in the same xdmf.h5 file: > DMView( DM , H5_XDMF_Viewer ); > VecView( UVelocity, H5_XDMF_Viewer ); > > 2) load the dm: > DMPlexCreateFromfile(PETSC_COMM_WORLD, Filename, PETSC_TRUE, DM); > > 3) load the physical field: > VecLoad( UVelocity, H5_XDMF_Viewer ); > > There are no errors in the execution, but the loaded DM is distributed > differently to the original one, which results in the incorrect > placement of the values of the physical fields (UVelocity etc.) in the > domain. > > This approach is used to restart the simulation with the last saved DM. > Is there something we are missing, or there exists alternative routes to > this goal? Can we somehow get the IS of the redistribution, so we can > re-distribute the vector data as well? > > Many thanks, best regards, > > Hi Berend, > > We are in the midst of rewriting this. We want to support saving multiple meshes, with fields attached to each, > and preserving the discretization (section) information, and allowing us to load up on a different number of > processes. We plan to be done by October. Vaclav and I are doing this in collaboration with Koki Sagiyama, > David Ham, and Lawrence Mitchell from the Firedrake team. The core load/save cycle functionality is now in PETSc main. So if you're using main rather than a release, you can get access to it now. This section of the manual shows an example of how to do things https://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5 Let us know if things aren't clear! Thanks, Lawrence From samuelestes91 at gmail.com Fri Sep 17 11:21:35 2021 From: samuelestes91 at gmail.com (Samuel Estes) Date: Fri, 17 Sep 2021 11:21:35 -0500 Subject: [petsc-users] Solving two successive linear systems Message-ID: Hi, I have two related questions about the best way to use the KSP solver: First, I have an adaptive FEM code which solves the same linear system at each iteration until the grid is refined at which point, obviously, the size of the linear system changes. Currently, I just call: KSPSetOperators(ksp,A,A); KSPSetFromOptions(); KSPSolve(ksp,b,x); In a separate part of the code, I re-create the matrix and vectors and call KSPReset(ksp); whenever the grid is refined. Is this an optimal way to do things? In particular, does KSPSetFromOptions need to be called before each solve or can I just call it once somewhere else and then be done with it. Does it need to be called after each call to KSPReset? There is a section in the PETSc Manual about solving successive linear systems but it is rather terse so I'm just trying to get a sense of how to optimally code this. Second, one model in this code actually successively solves two linear systems of different sizes (one system is n x n and the other is 3*n x 3*n). I solve this by creating two matrices, two right hand sides, and two solution vectors for each system. I currently just use one KSP object which I reset after each use since the linear system changes size each time. Would it be more efficient to simply allocate a second ksp solver object so that I don't have to call KSPReset every time? I'm not sure how much memory a ksp object requires or how much computation I would save by using a second solver. Any ideas here? This part of the code is also adaptive. Thanks in advance for the help. I hope my questions are clear. If not, I'm happy to clarify. Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Sep 17 11:37:04 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 17 Sep 2021 12:37:04 -0400 Subject: [petsc-users] Solving two successive linear systems In-Reply-To: References: Message-ID: <421D7542-769B-4763-9F6D-ABEC5C23C1B9@petsc.dev> > On Sep 17, 2021, at 12:21 PM, Samuel Estes wrote: > > Hi, > > I have two related questions about the best way to use the KSP solver: > > First, I have an adaptive FEM code which solves the same linear system at each iteration until the grid is refined at which point, obviously, the size of the linear system changes. Currently, I just call: > KSPSetOperators(ksp,A,A); > KSPSetFromOptions(); > KSPSolve(ksp,b,x); > In a separate part of the code, I re-create the matrix and vectors and call KSPReset(ksp); whenever the grid is refined. > Is this an optimal way to do things? In particular, does KSPSetFromOptions need to be called before each solve or can I just call it once somewhere else and then be done with it. Does it need to be called after each call to KSPReset? Yes, it is best to call the KSPSetFromOptions after each reset. You do not need to call it for each solve. > There is a section in the PETSc Manual about solving successive linear systems but it is rather terse so I'm just trying to get a sense of how to optimally code this. > > Second, one model in this code actually successively solves two linear systems of different sizes (one system is n x n and the other is 3*n x 3*n). I solve this by creating two matrices, two right hand sides, and two solution vectors for each system. I currently just use one KSP object which I reset after each use since the linear system changes size each time. Would it be more efficient to simply allocate a second ksp solver object so that I don't have to call KSPReset every time? I'm not sure how much memory a ksp object requires or how much computation I would save by using a second solver. Any ideas here? This part of the code is also adaptive. It is best to use two KSP. There is no advantage in reusing one. > > Thanks in advance for the help. I hope my questions are clear. If not, I'm happy to clarify. > > Sam > > From samuelestes91 at gmail.com Fri Sep 17 11:46:24 2021 From: samuelestes91 at gmail.com (Samuel Estes) Date: Fri, 17 Sep 2021 11:46:24 -0500 Subject: [petsc-users] Solving two successive linear systems In-Reply-To: <421D7542-769B-4763-9F6D-ABEC5C23C1B9@petsc.dev> References: <421D7542-769B-4763-9F6D-ABEC5C23C1B9@petsc.dev> Message-ID: Thanks for the help! On Fri, Sep 17, 2021 at 11:37 AM Barry Smith wrote: > > > > On Sep 17, 2021, at 12:21 PM, Samuel Estes > wrote: > > > > Hi, > > > > I have two related questions about the best way to use the KSP solver: > > > > First, I have an adaptive FEM code which solves the same linear system > at each iteration until the grid is refined at which point, obviously, the > size of the linear system changes. Currently, I just call: > > KSPSetOperators(ksp,A,A); > > KSPSetFromOptions(); > > KSPSolve(ksp,b,x); > > In a separate part of the code, I re-create the matrix and vectors and > call KSPReset(ksp); whenever the grid is refined. > > Is this an optimal way to do things? In particular, does > KSPSetFromOptions need to be called before each solve or can I just call it > once somewhere else and then be done with it. Does it need to be called > after each call to KSPReset? > > Yes, it is best to call the KSPSetFromOptions after each reset. You do > not need to call it for each solve. > > > There is a section in the PETSc Manual about solving successive linear > systems but it is rather terse so I'm just trying to get a sense of how to > optimally code this. > > > > Second, one model in this code actually successively solves two linear > systems of different sizes (one system is n x n and the other is 3*n x > 3*n). I solve this by creating two matrices, two right hand sides, and two > solution vectors for each system. I currently just use one KSP object which > I reset after each use since the linear system changes size each time. > Would it be more efficient to simply allocate a second ksp solver object so > that I don't have to call KSPReset every time? I'm not sure how much memory > a ksp object requires or how much computation I would save by using a > second solver. Any ideas here? This part of the code is also adaptive. > > It is best to use two KSP. There is no advantage in reusing one. > > > > Thanks in advance for the help. I hope my questions are clear. If not, > I'm happy to clarify. > > > > Sam > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun Sep 19 08:29:18 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 19 Sep 2021 09:29:18 -0400 Subject: [petsc-users] Spock link error Message-ID: I am getting to see this error. It seems to be suggesting that I turn --no-allow-shlib-undefined off. Any ideas? Thanks, Mark 09:09 main= /gpfs/alpine/csc314/scratch/adams/petsc$ make PETSC_DIR=/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new PETSC_ARCH="" check Running check examples to verify correct installation Using PETSC_DIR=/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new and PETSC_ARCH= gmake[3]: [/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/petsc/conf/rules:301: ex19.PETSc] Error 2 (ignored) *******************Error detected during compile or link!******************* See http://www.mcs.anl.gov/petsc/documentation/faq.html /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex19 ********************************************************************************* cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2 -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2 -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include -I/opt/rocm-4.2.0/include ex19.c -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -Wl,-rpath,/opt/rocm-4.2.0/lib -L/opt/rocm-4.2.0/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/ 21.06.1.1/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ 21.06.1.1/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/mpich/default/gtl/lib -L/opt/cray/pe/mpich/default/gtl/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.1.5/dsmml/lib -L/opt/cray/pe/dsmml/0.1.5/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.12/lib -L/opt/cray/pe/pmi/6.0.12/lib -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce/x86_64/lib -L/opt/cray/pe/cce/12.0.1/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux -L/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib -lpetsc -lparmetis -lmetis -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -lstdc++ -ldl -lmpifort_cray -lmpi_cray -lmpi_gtl_hsa -ldsmml -lpmi -lxpmem -lpgas-shmem -lquadmath -lcrayacc_amdgpu -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -o ex19 *ld.lld: error: /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]ld.lld: error: /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]ld.lld: error: /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]* clang-12: error: linker command failed with exit code 1 (use -v to see invocation) gmake[4]: *** [: ex19] Error 1 *******************Error detected during compile or link!******************* See http://www.mcs.anl.gov/petsc/documentation/faq.html /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex5f ********************************************************* ftn -fPIC -g -O2 -fPIC -g -O2 -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include -I/opt/rocm-4.2.0/include ex5f.F90 -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -Wl,-rpath,/opt/rocm-4.2.0/lib -L/opt/rocm-4.2.0/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/ 21.06.1.1/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ 21.06.1.1/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/mpich/default/gtl/lib -L/opt/cray/pe/mpich/default/gtl/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.1.5/dsmml/lib -L/opt/cray/pe/dsmml/0.1.5/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.12/lib -L/opt/cray/pe/pmi/6.0.12/lib -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce/x86_64/lib -L/opt/cray/pe/cce/12.0.1/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux -L/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib -lpetsc -lparmetis -lmetis -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -lstdc++ -ldl -lmpifort_cray -lmpi_cray -lmpi_gtl_hsa -ldsmml -lpmi -lxpmem -lpgas-shmem -lquadmath -lcrayacc_amdgpu -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -o ex5f /opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: warning: alignment 128 of symbol `$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in /opt/cray/pe/cce/12.0.1/cce/x86_64/lib/libmodules.so is smaller than 256 in /tmp/pe_39617/ex5f_1.o /opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: warning: alignment 64 of symbol `$data_init$iso_c_binding_' in /opt/cray/pe/cce/12.0.1/cce/x86_64/lib/libmodules.so is smaller than 256 in /tmp/pe_39617/ex5f_1.o Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process Completed test examples 09:12 main= /gpfs/alpine/csc314/scratch/adams/petsc$ module list Currently Loaded Modules: 1) craype-x86-rome 4) perftools-base/21.05.0 7) cray-pmi-lib/6.0.12 10) cray-dsmml/0.1.5 13) PrgEnv-cray/8.1.0 16) rocm/4.2.0 19) autoconf/2.69 2) libfabric/1.11.0.4.75 5) xpmem/2.2.40-2.1_2.44__g3cf3325.shasta 8) cce/12.0.1 11) cray-mpich/8.1.7 14) DefApps/default 17) emacs/27.2 20) automake/1.16.3 3) craype-network-ofi 6) cray-pmi/6.0.12 9) craype/2.7.8 12) cray-libsci/21.06.1.1 15) craype-accel-amd-gfx908 18) zlib/1.2.11 21) libtool/2.4.6 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 113451 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1952129 bytes Desc: not available URL: From stefano.zampini at gmail.com Sun Sep 19 08:43:59 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sun, 19 Sep 2021 16:43:59 +0300 Subject: [petsc-users] Spock link error In-Reply-To: References: Message-ID: Are you following the user advices here https://docs.olcf.ornl.gov/systems/spock_quick_start_guide.html#compiling-with-the-cray-compiler-wrappers-cc-or-cc ? Il giorno dom 19 set 2021 alle ore 16:30 Mark Adams ha scritto: > I am getting to see this error. It seems to be suggesting that I turn > --no-allow-shlib-undefined off. > Any ideas? > Thanks, > Mark > > 09:09 main= /gpfs/alpine/csc314/scratch/adams/petsc$ make > PETSC_DIR=/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new > PETSC_ARCH="" check > Running check examples to verify correct installation > Using > PETSC_DIR=/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new > and PETSC_ARCH= > gmake[3]: > [/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/petsc/conf/rules:301: > ex19.PETSc] Error 2 (ignored) > *******************Error detected during compile or > link!******************* > See http://www.mcs.anl.gov/petsc/documentation/faq.html > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex19 > > ********************************************************************************* > cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas > -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2 -fPIC > -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas > -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2 > -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include > -I/opt/rocm-4.2.0/include ex19.c > -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib > -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib > -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib > -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib > -Wl,-rpath,/opt/rocm-4.2.0/lib -L/opt/rocm-4.2.0/lib > -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 > -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/ > 21.06.1.1/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ > 21.06.1.1/CRAY/9.0/x86_64/lib > -Wl,-rpath,/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib > -L/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib > -Wl,-rpath,/opt/cray/pe/mpich/default/gtl/lib > -L/opt/cray/pe/mpich/default/gtl/lib > -Wl,-rpath,/opt/cray/pe/dsmml/0.1.5/dsmml/lib > -L/opt/cray/pe/dsmml/0.1.5/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.12/lib > -L/opt/cray/pe/pmi/6.0.12/lib > -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce/x86_64/lib > -L/opt/cray/pe/cce/12.0.1/cce/x86_64/lib > -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 > -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 > -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux > -L/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux > -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 > -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 > -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib > -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib > -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib > -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib > -lpetsc -lparmetis -lmetis -lhipsparse -lhipblas -lrocsparse -lrocsolver > -lrocblas -lrocrand -lamdhip64 -lstdc++ -ldl -lmpifort_cray -lmpi_cray > -lmpi_gtl_hsa -ldsmml -lpmi -lxpmem -lpgas-shmem -lquadmath > -lcrayacc_amdgpu -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup > -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 > -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -o ex19 > > > *ld.lld: error: > /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: > undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa > [--no-allow-shlib-undefined]ld.lld: error: > /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: > undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa > [--no-allow-shlib-undefined]ld.lld: error: > /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: > undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa > [--no-allow-shlib-undefined]* > clang-12: error: linker command failed with exit code 1 (use -v to see > invocation) > gmake[4]: *** [: ex19] Error 1 > *******************Error detected during compile or > link!******************* > See http://www.mcs.anl.gov/petsc/documentation/faq.html > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex5f > ********************************************************* > ftn -fPIC -g -O2 -fPIC -g -O2 > -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include > -I/opt/rocm-4.2.0/include ex5f.F90 > -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib > -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib > -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib > -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib > -Wl,-rpath,/opt/rocm-4.2.0/lib -L/opt/rocm-4.2.0/lib > -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 > -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/ > 21.06.1.1/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ > 21.06.1.1/CRAY/9.0/x86_64/lib > -Wl,-rpath,/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib > -L/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib > -Wl,-rpath,/opt/cray/pe/mpich/default/gtl/lib > -L/opt/cray/pe/mpich/default/gtl/lib > -Wl,-rpath,/opt/cray/pe/dsmml/0.1.5/dsmml/lib > -L/opt/cray/pe/dsmml/0.1.5/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.12/lib > -L/opt/cray/pe/pmi/6.0.12/lib > -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce/x86_64/lib > -L/opt/cray/pe/cce/12.0.1/cce/x86_64/lib > -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 > -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 > -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux > -L/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux > -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 > -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 > -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib > -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib > -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib > -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib > -lpetsc -lparmetis -lmetis -lhipsparse -lhipblas -lrocsparse -lrocsolver > -lrocblas -lrocrand -lamdhip64 -lstdc++ -ldl -lmpifort_cray -lmpi_cray > -lmpi_gtl_hsa -ldsmml -lpmi -lxpmem -lpgas-shmem -lquadmath > -lcrayacc_amdgpu -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup > -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 > -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -o ex5f > /opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: > warning: alignment 128 of symbol > `$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in > /opt/cray/pe/cce/12.0.1/cce/x86_64/lib/libmodules.so is smaller than 256 in > /tmp/pe_39617/ex5f_1.o > /opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: > warning: alignment 64 of symbol `$data_init$iso_c_binding_' in > /opt/cray/pe/cce/12.0.1/cce/x86_64/lib/libmodules.so is smaller than 256 in > /tmp/pe_39617/ex5f_1.o > Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process > Completed test examples > 09:12 main= /gpfs/alpine/csc314/scratch/adams/petsc$ module list > > Currently Loaded Modules: > 1) craype-x86-rome 4) perftools-base/21.05.0 > 7) cray-pmi-lib/6.0.12 10) cray-dsmml/0.1.5 13) PrgEnv-cray/8.1.0 > 16) rocm/4.2.0 19) autoconf/2.69 > 2) libfabric/1.11.0.4.75 5) xpmem/2.2.40-2.1_2.44__g3cf3325.shasta > 8) cce/12.0.1 11) cray-mpich/8.1.7 14) DefApps/default > 17) emacs/27.2 20) automake/1.16.3 > 3) craype-network-ofi 6) cray-pmi/6.0.12 > 9) craype/2.7.8 12) cray-libsci/21.06.1.1 15) > craype-accel-amd-gfx908 18) zlib/1.2.11 21) libtool/2.4.6 > > > -- Stefano -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun Sep 19 11:58:19 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 19 Sep 2021 12:58:19 -0400 Subject: [petsc-users] Spock link error In-Reply-To: References: Message-ID: Yes, I had the hsa lib commented out but that did not help (appended). I now see that I had this problem in July and Junchao was helping. I was able to fix it with PrgEnv-gnu. THe fortran test actually worked. Oh well, the application does their own linking so maybe that will fix it up. (They do use OMP). Thanks, Mark gmake[3]: [/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/petsc/conf/rules:301: ex19.PETSc] Error 2 (ignored) *******************Error detected during compile or link!******************* See http://www.mcs.anl.gov/petsc/documentation/faq.html /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex19 ********************************************************************************* cc *-L/opt/rocm-4.2.0/lib -lhsa-runtime64* -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2 -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2 -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include -I/opt/rocm-4.2.0/include ex19.c -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib -Wl,-rpath,/opt/rocm-4.2.0/lib -L/opt/rocm-4.2.0/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/ 21.06.1.1/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ 21.06.1.1/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/mpich/default/gtl/lib -L/opt/cray/pe/mpich/default/gtl/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.1.5/dsmml/lib -L/opt/cray/pe/dsmml/0.1.5/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.12/lib -L/opt/cray/pe/pmi/6.0.12/lib -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce/x86_64/lib -L/opt/cray/pe/cce/12.0.1/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux -L/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib -lpetsc -lparmetis -lmetis -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64* -lhsa-runtime64 *-lstdc++ -ldl -lmpifort_cray -lmpi_cray -lmpi_gtl_hsa -ldsmml -lpmi -lxpmem -lpgas-shmem -lquadmath -lcrayacc_amdgpu -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -o ex19 ld.lld: error: /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined] ld.lld: error: /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined] ld.lld: error: /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined] clang-12: error: linker command failed with exit code 1 (use -v to see invocation) gmake[4]: *** [: ex19] Error 1 On Sun, Sep 19, 2021 at 9:44 AM Stefano Zampini wrote: > Are you following the user advices here > https://docs.olcf.ornl.gov/systems/spock_quick_start_guide.html#compiling-with-the-cray-compiler-wrappers-cc-or-cc > ? > > Il giorno dom 19 set 2021 alle ore 16:30 Mark Adams ha > scritto: > >> I am getting to see this error. It seems to be suggesting that I turn >> --no-allow-shlib-undefined off. >> Any ideas? >> Thanks, >> Mark >> >> 09:09 main= /gpfs/alpine/csc314/scratch/adams/petsc$ make >> PETSC_DIR=/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new >> PETSC_ARCH="" check >> Running check examples to verify correct installation >> Using >> PETSC_DIR=/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new >> and PETSC_ARCH= >> gmake[3]: >> [/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/petsc/conf/rules:301: >> ex19.PETSc] Error 2 (ignored) >> *******************Error detected during compile or >> link!******************* >> See http://www.mcs.anl.gov/petsc/documentation/faq.html >> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex19 >> >> ********************************************************************************* >> cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas >> -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2 -fPIC >> -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas >> -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2 >> -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include >> -I/opt/rocm-4.2.0/include ex19.c >> -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib >> -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib >> -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib >> -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib >> -Wl,-rpath,/opt/rocm-4.2.0/lib -L/opt/rocm-4.2.0/lib >> -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 >> -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/ >> 21.06.1.1/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ >> 21.06.1.1/CRAY/9.0/x86_64/lib >> -Wl,-rpath,/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib >> -L/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib >> -Wl,-rpath,/opt/cray/pe/mpich/default/gtl/lib >> -L/opt/cray/pe/mpich/default/gtl/lib >> -Wl,-rpath,/opt/cray/pe/dsmml/0.1.5/dsmml/lib >> -L/opt/cray/pe/dsmml/0.1.5/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.12/lib >> -L/opt/cray/pe/pmi/6.0.12/lib >> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce/x86_64/lib >> -L/opt/cray/pe/cce/12.0.1/cce/x86_64/lib >> -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 >> -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 >> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux >> -L/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux >> -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 >> -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 >> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib >> -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib >> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib >> -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib >> -lpetsc -lparmetis -lmetis -lhipsparse -lhipblas -lrocsparse -lrocsolver >> -lrocblas -lrocrand -lamdhip64 -lstdc++ -ldl -lmpifort_cray -lmpi_cray >> -lmpi_gtl_hsa -ldsmml -lpmi -lxpmem -lpgas-shmem -lquadmath >> -lcrayacc_amdgpu -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup >> -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 >> -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -o ex19 >> >> >> *ld.lld: error: >> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: >> undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa >> [--no-allow-shlib-undefined]ld.lld: error: >> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: >> undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa >> [--no-allow-shlib-undefined]ld.lld: error: >> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so: >> undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa >> [--no-allow-shlib-undefined]* >> clang-12: error: linker command failed with exit code 1 (use -v to see >> invocation) >> gmake[4]: *** [: ex19] Error 1 >> *******************Error detected during compile or >> link!******************* >> See http://www.mcs.anl.gov/petsc/documentation/faq.html >> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex5f >> ********************************************************* >> ftn -fPIC -g -O2 -fPIC -g -O2 >> -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include >> -I/opt/rocm-4.2.0/include ex5f.F90 >> -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib >> -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib >> -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib >> -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib >> -Wl,-rpath,/opt/rocm-4.2.0/lib -L/opt/rocm-4.2.0/lib >> -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 >> -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/ >> 21.06.1.1/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ >> 21.06.1.1/CRAY/9.0/x86_64/lib >> -Wl,-rpath,/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib >> -L/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib >> -Wl,-rpath,/opt/cray/pe/mpich/default/gtl/lib >> -L/opt/cray/pe/mpich/default/gtl/lib >> -Wl,-rpath,/opt/cray/pe/dsmml/0.1.5/dsmml/lib >> -L/opt/cray/pe/dsmml/0.1.5/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.12/lib >> -L/opt/cray/pe/pmi/6.0.12/lib >> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce/x86_64/lib >> -L/opt/cray/pe/cce/12.0.1/cce/x86_64/lib >> -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 >> -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 >> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux >> -L/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux >> -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 >> -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 >> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib >> -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib >> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib >> -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib >> -lpetsc -lparmetis -lmetis -lhipsparse -lhipblas -lrocsparse -lrocsolver >> -lrocblas -lrocrand -lamdhip64 -lstdc++ -ldl -lmpifort_cray -lmpi_cray >> -lmpi_gtl_hsa -ldsmml -lpmi -lxpmem -lpgas-shmem -lquadmath >> -lcrayacc_amdgpu -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup >> -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 >> -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -o ex5f >> /opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: >> warning: alignment 128 of symbol >> `$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in >> /opt/cray/pe/cce/12.0.1/cce/x86_64/lib/libmodules.so is smaller than 256 in >> /tmp/pe_39617/ex5f_1.o >> /opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: >> warning: alignment 64 of symbol `$data_init$iso_c_binding_' in >> /opt/cray/pe/cce/12.0.1/cce/x86_64/lib/libmodules.so is smaller than 256 in >> /tmp/pe_39617/ex5f_1.o >> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI >> process >> Completed test examples >> 09:12 main= /gpfs/alpine/csc314/scratch/adams/petsc$ module list >> >> Currently Loaded Modules: >> 1) craype-x86-rome 4) perftools-base/21.05.0 >> 7) cray-pmi-lib/6.0.12 10) cray-dsmml/0.1.5 13) PrgEnv-cray/8.1.0 >> 16) rocm/4.2.0 19) autoconf/2.69 >> 2) libfabric/1.11.0.4.75 5) xpmem/2.2.40-2.1_2.44__g3cf3325.shasta >> 8) cce/12.0.1 11) cray-mpich/8.1.7 14) DefApps/default >> 17) emacs/27.2 20) automake/1.16.3 >> 3) craype-network-ofi 6) cray-pmi/6.0.12 >> 9) craype/2.7.8 12) cray-libsci/21.06.1.1 15) >> craype-accel-amd-gfx908 18) zlib/1.2.11 21) libtool/2.4.6 >> >> >> > > -- > Stefano > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlos.v.hd1 at gmail.com Sun Sep 19 14:20:57 2021 From: carlos.v.hd1 at gmail.com (Carlos Velazquez) Date: Sun, 19 Sep 2021 14:20:57 -0500 Subject: [petsc-users] How to get specific ordering Message-ID: Hi there. I have a question whether I can get the DAG values in DMPlex in a different order. I am working on a code that uses the same node ordering to form the elements found in the mesh file. Example: If I use the file "doublet-tet.msh" that is inside the DMPlex mesh files folder, this file tells us that the elements are formed as follows: 1 - 2 4 3 1 2 - 2 3 4 5 Element 1 by nodes 2, 4, 3, 1 Element 2 by nodes 2, 3, 4, 5 So what I'm doing to get this through DMPlex is getting the transitive closure with DMPlexGetTransitiveClosure and getting the points from level 0, which is the node level, but the ordering is different. Example: With DMPlexGetTransitiveClosure I obtain that the elements are formed as follows: 0 - 5 3 4 2 1 - 4 3 5 6 Element 1 by nodes 5, 3, 4, 2 Element 2 by nodes 4, 3, 5, 6 But comparing this ordering with the previous one in the coordinate matrix I can see that the order is not equivalent. I would like to know if there is a way to modify the ordering of the graph to obtain the same ordering that is in the mesh file for the nodes that make up the elements or even if there is some way to configure it for a specific desired order. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Mon Sep 20 08:38:29 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Mon, 20 Sep 2021 14:38:29 +0100 Subject: [petsc-users] DMView and DMLoad In-Reply-To: <20210920130810.Horde.MErFrSPts47GDBNOjJQvNHt@webmailer.ovgu.de> References: <20210920130810.Horde.MErFrSPts47GDBNOjJQvNHt@webmailer.ovgu.de> Message-ID: <4BEA1166-E3AD-45E8-A758-D76767669725@gmx.li> Dear Sergio, (Added petsc-users back to cc), > On 20 Sep 2021, at 14:08, sergio.bengoechea at ovgu.de wrote: > > Dear Lawrence, > > thanks for the HDF5 saving and loading example. > > In the documentation you sent (https://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5) the link to a more comprehensive example (DMPlex Tutorial ex12) is not working. Hmm, I'm not in charge of how the documentation gets built. @Patrick, if I look here: https://petsc.org/main/src/dm/impls/plex/tutorials/, I don't see ex12.c (even though it is here https://gitlab.com/petsc/petsc/-/tree/main/src/dm/impls/plex/tutorials) The relevant link in the docs goes to https://petsc.org/main/docs/src/dm/impls/plex/tutorials/ex12.c.html (which is 404), I suppose it should be main/src/... (not main/docs/src...) > I am afraid I would need to see the whole example to make our case work. If I could have access the completed source code of that example would be of a great help. You can find the relevant example in the PETSc source tree https://gitlab.com/petsc/petsc/-/blob/main/src/dm/impls/plex/tutorials/ex12.c Lawrence From knepley at gmail.com Mon Sep 20 08:43:40 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 Sep 2021 09:43:40 -0400 Subject: [petsc-users] How to get specific ordering In-Reply-To: References: Message-ID: On Sun, Sep 19, 2021 at 3:27 PM Carlos Velazquez wrote: > Hi there. I have a question whether I can get the DAG values in DMPlex in > a different order. > > I am working on a code that uses the same node ordering to form the > elements found in the mesh file. Example: > > If I use the file "doublet-tet.msh" that is inside the DMPlex mesh files > folder, this file tells us that the elements are formed as follows: > > 1 - 2 4 3 1 > 2 - 2 3 4 5 > Element 1 by nodes 2, 4, 3, 1 > Element 2 by nodes 2, 3, 4, 5 > > So what I'm doing to get this through DMPlex is getting the transitive > closure with DMPlexGetTransitiveClosure and getting the points from level > 0, which is the node level, but the ordering is different. Example: > > With DMPlexGetTransitiveClosure I obtain that the elements are formed as > follows: > > 0 - 5 3 4 2 > 1 - 4 3 5 6 > Element 1 by nodes 5, 3, 4, 2 > Element 2 by nodes 4, 3, 5, 6 > > But comparing this ordering with the previous one in the coordinate matrix > I can see that the order is not equivalent. > > I would like to know if there is a way to modify the ordering of the graph > to obtain the same ordering that is in the mesh file for the nodes that > make up the elements or even if there is some way to configure it for a > specific desired order. > The problem is that GMsh orients tetrahedra differently than Plex. We like outward normals, whereas the GMsh convention has the normal for the first face pointing inward. Thus, when we read in a GMsh tet, we flip the first two vertices. So 0 - 5 3 4 2 but if we number from 1 instead of 0, and number cells and vertices separately, 1 - 4 2 3 1 which if you flip vertices 1 and 2 is 1 - 2 4 3 1 which is what you read in. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Mon Sep 20 08:43:49 2021 From: pierre at joliv.et (Pierre Jolivet) Date: Mon, 20 Sep 2021 15:43:49 +0200 Subject: [petsc-users] DMView and DMLoad In-Reply-To: <4BEA1166-E3AD-45E8-A758-D76767669725@gmx.li> References: <20210920130810.Horde.MErFrSPts47GDBNOjJQvNHt@webmailer.ovgu.de> <4BEA1166-E3AD-45E8-A758-D76767669725@gmx.li> Message-ID: > On 20 Sep 2021, at 3:38 PM, Lawrence Mitchell wrote: > > Dear Sergio, > > (Added petsc-users back to cc), > >> On 20 Sep 2021, at 14:08, sergio.bengoechea at ovgu.de wrote: >> >> Dear Lawrence, >> >> thanks for the HDF5 saving and loading example. >> >> In the documentation you sent (https://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5) the link to a more comprehensive example (DMPlex Tutorial ex12) is not working. > > Hmm, I'm not in charge of how the documentation gets built. > > @Patrick, if I look here: https://petsc.org/main/src/dm/impls/plex/tutorials/, I don't see ex12.c (even though it is here https://gitlab.com/petsc/petsc/-/tree/main/src/dm/impls/plex/tutorials) It needs to be added to EXAMPLESC in src/dm/impls/plex/tutorials/makefile By the way, none of the ?Actual source code: XYZ.c? are working anymore (not specific to DMPlex and/or tutorials), e.g., https://petsc.org/main/src/dm/impls/plex/tutorials/ex1.c.html Actual source code: ex1.c redirects to the same eye-candy/filtered .html instead of the raw/unfiltered .c Thanks, Pierre > The relevant link in the docs goes to https://petsc.org/main/docs/src/dm/impls/plex/tutorials/ex12.c.html (which is 404), I suppose it should be main/src/... (not main/docs/src...) > >> I am afraid I would need to see the whole example to make our case work. If I could have access the completed source code of that example would be of a great help. > > You can find the relevant example in the PETSc source tree https://gitlab.com/petsc/petsc/-/blob/main/src/dm/impls/plex/tutorials/ex12.c > > Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Mon Sep 20 10:51:33 2021 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Mon, 20 Sep 2021 17:51:33 +0200 Subject: [petsc-users] DMView and DMLoad In-Reply-To: <4BEA1166-E3AD-45E8-A758-D76767669725@gmx.li> References: <20210920130810.Horde.MErFrSPts47GDBNOjJQvNHt@webmailer.ovgu.de> <4BEA1166-E3AD-45E8-A758-D76767669725@gmx.li> Message-ID: Thanks for reporting that! There are still some things in the "classic" docs build that we could make more robust but it's not clear if we should devote the energy to that or to continuing to replace those processes with new ones which are better integrated with the Sphinx build. MR that should hopefully at least partially fix this particular issue: https://gitlab.com/petsc/petsc/-/merge_requests/4331 General issue: https://gitlab.com/petsc/petsc/-/issues/279 Am Mo., 20. Sept. 2021 um 15:38 Uhr schrieb Lawrence Mitchell : > Dear Sergio, > > (Added petsc-users back to cc), > > > On 20 Sep 2021, at 14:08, sergio.bengoechea at ovgu.de wrote: > > > > Dear Lawrence, > > > > thanks for the HDF5 saving and loading example. > > > > In the documentation you sent ( > https://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5) > the link to a more comprehensive example (DMPlex Tutorial ex12) is not > working. > > Hmm, I'm not in charge of how the documentation gets built. > > @Patrick, if I look here: > https://petsc.org/main/src/dm/impls/plex/tutorials/, I don't see ex12.c > (even though it is here > https://gitlab.com/petsc/petsc/-/tree/main/src/dm/impls/plex/tutorials) > > The relevant link in the docs goes to > https://petsc.org/main/docs/src/dm/impls/plex/tutorials/ex12.c.html > (which is 404), I suppose it should be main/src/... (not main/docs/src...) > > > I am afraid I would need to see the whole example to make our case work. > If I could have access the completed source code of that example would be > of a great help. > > You can find the relevant example in the PETSc source tree > https://gitlab.com/petsc/petsc/-/blob/main/src/dm/impls/plex/tutorials/ex12.c > > Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From varunhiremath at gmail.com Mon Sep 20 17:23:09 2021 From: varunhiremath at gmail.com (Varun Hiremath) Date: Mon, 20 Sep 2021 15:23:09 -0700 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> Message-ID: Hi Jose, Sorry, it took me a while to test these settings in the new builds. I am getting good improvement in performance using the preconditioned solvers, so thanks for the suggestions! But I have some questions related to the usage. We are using SLEPc to solve the acoustic modal eigenvalue problem. Attached is a simple standalone program that computes acoustic modes in a simple rectangular box. This program illustrates the general setup I am using, though here the shell matrix and the preconditioner matrix are the same, while in my actual program the shell matrix computes A*x without explicitly forming A, and the preconditioner is a 0th order approximation of A. In the attached program I have tested both 1) the Krylov-Schur with inexact shift-and-invert (implemented under the option sinvert); 2) the JD solver with preconditioner (implemented under the option usejd) Both the solvers seem to work decently, compared to no preconditioning. This is how I run the two solvers (for a mesh size of 1600x400): $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 -eps_target 0 Both finish in about ~10 minutes on my system in serial. JD seems to be slightly faster and more accurate (for the imaginary part of eigenvalue). The program also runs in parallel using mpiexec. I use complex builds, as in my main program the matrix can be complex. Now here are my questions: 1) For this particular problem type, could you please check if these are the best settings that one could use? I have tried different combinations of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI seems to work the best in serial and parallel. 2) When I tested these settings in my main program, for some reason the JD solver was not converging. After further testing, I found the issue was related to the setting of "-eps_target 0". I have included " EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to passing "-eps_target 0" from the command line, but that doesn't seem to be the case. For instance, if I run the attached program without "-eps_target 0" in the command line then it doesn't converge. $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 the above finishes in about 10 minutes $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 the above doesn't converge even though "EPSSetTarget(eps,0.0);" is included in the code This only seems to affect the JD solver, not the Krylov shift-and-invert (-sinvert 1) option. So is there any difference between passing "-eps_target 0" from the command line vs using "EPSSetTarget(eps,0.0);" in the code? I cannot pass any command line arguments in my actual program, so need to set everything internally. 3) Also, another minor related issue. While using the inexact shift-and-invert option, I was running into the following error: "" Missing or incorrect user input Shift-and-invert requires a target 'which' (see EPSSetWhichEigenpairs), for instance -st_type sinvert -eps_target 0 -eps_target_magnitude "" I already have the below two lines in the code: EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); EPSSetTarget(eps,0.0); so shouldn't these be enough? If I comment out the first line "EPSSetWhichEigenpairs", then the code works fine. I have some more questions regarding setting the preconditioner for a quadratic eigenvalue problem, which I will ask in a follow-up email. Thanks for your help! -Varun On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath wrote: > Thank you very much for these suggestions! We are currently using version > 3.12, so I'll try to update to the latest version and try your suggestions. > Let me get back to you, thanks! > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman wrote: > >> Then I would try Davidson methods https://doi.org/10.1145/2543696 >> You can also try Krylov-Schur with "inexact" shift-and-invert, for >> instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the >> users manual. >> >> In both cases, you have to pass matrix A in the call to EPSSetOperators() >> and the preconditioner matrix via STSetPreconditionerMat() - note this >> function was introduced in version 3.15. >> >> Jose >> >> >> >> > El 1 jul 2021, a las 13:36, Varun Hiremath >> escribi?: >> > >> > Thanks. I actually do have a 1st order approximation of matrix A, that >> I can explicitly compute and also invert. Can I use that matrix as >> preconditioner to speed things up? Is there some example that explains how >> to setup and call SLEPc for this scenario? >> > >> > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman wrote: >> > For smallest real parts one could adapt ex34.c, but it is going to be >> costly >> https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html >> > Also, if eigenvalues are clustered around the origin, convergence may >> still be very slow. >> > >> > It is a tough problem, unless you are able to compute a good >> preconditioner of A (no need to compute the exact inverse). >> > >> > Jose >> > >> > >> > > El 1 jul 2021, a las 13:23, Varun Hiremath >> escribi?: >> > > >> > > I'm solving for the smallest eigenvalues in magnitude. Though is it >> cheaper to solve smallest in real part, as that might also work in my case? >> Thanks for your help. >> > > >> > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman >> wrote: >> > > Smallest eigenvalue in magnitude or real part? >> > > >> > > >> > > > El 1 jul 2021, a las 11:58, Varun Hiremath >> escribi?: >> > > > >> > > > Sorry, no both A and B are general sparse matrices (non-hermitian). >> So is there anything else I could try? >> > > > >> > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman >> wrote: >> > > > Is the problem symmetric (GHEP)? In that case, you can try LOBPCG >> on the pair (A,B). But this will likely be slow as well, unless you can >> provide a good preconditioner. >> > > > >> > > > Jose >> > > > >> > > > >> > > > > El 1 jul 2021, a las 11:37, Varun Hiremath < >> varunhiremath at gmail.com> escribi?: >> > > > > >> > > > > Hi All, >> > > > > >> > > > > I am trying to compute the smallest eigenvalues of a generalized >> system A*x= lambda*B*x. I don't explicitly know the matrix A (so I am using >> a shell matrix with a custom matmult function) however, the matrix B is >> explicitly known so I compute inv(B)*A within the shell matrix and solve >> inv(B)*A*x = lambda*x. >> > > > > >> > > > > To compute the smallest eigenvalues it is recommended to solve >> the inverted system, but since matrix A is not explicitly known I can't >> invert the system. Moreover, the size of the system can be really big, and >> with the default Krylov solver, it is extremely slow. So is there a better >> way for me to compute the smallest eigenvalues of this system? >> > > > > >> > > > > Thanks, >> > > > > Varun >> > > > >> > > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: acoustic_box_test.cpp Type: application/octet-stream Size: 7367 bytes Desc: not available URL: From lihaolin at stu.xjtu.edu.cn Mon Sep 20 21:49:07 2021 From: lihaolin at stu.xjtu.edu.cn (=?UTF-8?B?5p2O5piK6ZyW?=) Date: Tue, 21 Sep 2021 10:49:07 +0800 (GMT+08:00) Subject: [petsc-users] Uses of VecGetArrayF90() and VecGetArrayReadF90() in Recent versions of GNU Fortran. Message-ID: <51cf3661.2408a.17c0641cb69.Coremail.lihaolin@stu.xjtu.edu.cn> Dear all, I used PETSc in my full Fortran codes and it worked well when my codes were compiled by GNU Fortran (GCC) 4.8.4. But for some reasons, I had to update the GNU Fortran (GCC) to version 10.0.1. Then I reinstalled the MPICH and PETSc with the newer complier and compiled my codes successfully. However, I got the following error massage: Index '1' of dimension 1 of array 'xx' above upper bound of 0. where xx is the Fortran pointer obtained by calling VecGetArrayF90(vec,xx,ierr). The vector was built successfully, but it seemed that the Fortran pointer xx was not built. I got the same error massage when using VecGetArrayReadF90(). So, are the VecGetArrayF90() and VecGetArrayReadF90() not compatible with the recent versions of GNU Fortran? Or is there any other way to access the vectors? For solving a linear problem Ab=x, I use VecGetArrayF90() to get the Fortran pointer to update b and use VecGetArrayReadF90() to get the values of x. After some tests, I found that VecGetArrayF90() could be replaced by VecSetValues(), but VecGetValues() used to replace VecGetArrayReadF90() could not be run with multiple threads. I look forward to your reply and thank you for any suggestions. Best regards, Haolin Li -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Sep 20 21:27:51 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 20 Sep 2021 21:27:51 -0500 (CDT) Subject: [petsc-users] Uses of VecGetArrayF90() and VecGetArrayReadF90() in Recent versions of GNU Fortran. In-Reply-To: <51cf3661.2408a.17c0641cb69.Coremail.lihaolin@stu.xjtu.edu.cn> References: <51cf3661.2408a.17c0641cb69.Coremail.lihaolin@stu.xjtu.edu.cn> Message-ID: <6c60abda-b135-ca8b-1c13-88c03469b0f7@mcs.anl.gov> VecGetArrayF90 should work with newer gfortran versions. https://petsc.org/release/docs/manualpages/Vec/VecGetArrayF90.html Check the examples listed above to see if usage in your code is different. [run them with your build of petsc/compilers to verify] And make sure you are using the latest version of PETSc. If you still have issues - send us a reproducible example. Satish On Tue, 21 Sep 2021, ??? wrote: > Dear all, > > I used PETSc in my full Fortran codes and it worked well when my codes were compiled by GNU Fortran (GCC) 4.8.4. But for some reasons, I had to update the GNU Fortran (GCC) to version 10.0.1. Then I reinstalled the MPICH and PETSc with the newer complier and compiled my codes successfully. However, I got the following error massage: > > Index '1' of dimension 1 of array 'xx' above upper bound of 0. > > where xx is the Fortran pointer obtained by calling VecGetArrayF90(vec,xx,ierr). > > The vector was built successfully, but it seemed that the Fortran pointer xx was not built. I got the same error massage when using VecGetArrayReadF90(). So, are the VecGetArrayF90() and VecGetArrayReadF90() not compatible with the recent versions of GNU Fortran? Or is there any other way to access the vectors? For solving a linear problem Ab=x, I use VecGetArrayF90() to get the Fortran pointer to update b and use VecGetArrayReadF90() to get the values of x. After some tests, I found that VecGetArrayF90() could be replaced by VecSetValues(), but VecGetValues() used to replace VecGetArrayReadF90() could not be run with multiple threads. > > I look forward to your reply and thank you for any suggestions. > > Best regards, > > Haolin Li From daniel.stone at opengosim.com Tue Sep 21 05:19:37 2021 From: daniel.stone at opengosim.com (Daniel Stone) Date: Tue, 21 Sep 2021 11:19:37 +0100 Subject: [petsc-users] Is this a bug? MatMultAdd_SeqBAIJ_11 Message-ID: Hello, If we look at lines 2330-2331 in file baij2.c, it looks like there are some mistakes in assigning the `sum..` variables to the z array, causing the function MatMultAdd_SeqBAIJ_11() to not produce the correct answer. I don't have a good example program to demonstrate this yet - it's currently causing problems in a dev branch of pflotan_ogs that can produce blocksize 11 matrices. When in parallel, a standard matrix-vector multiplication calls MatMultAdd for the off-proc contributions, and the result is wrong when this is redirected to MatMultAdd_SeqBAIJ_11. Seems to be the root cause of several solvers failing such as fgmres. Can anyone confirm that these two lines seem incorrect? Thanks, Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Tue Sep 21 05:50:01 2021 From: wence at gmx.li (Lawrence Mitchell) Date: Tue, 21 Sep 2021 11:50:01 +0100 Subject: [petsc-users] Is this a bug? MatMultAdd_SeqBAIJ_11 In-Reply-To: References: Message-ID: Hi Daniel, > On 21 Sep 2021, at 11:19, Daniel Stone wrote: > > Hello, > > If we look at lines 2330-2331 in file baij2.c, it looks like there are some > mistakes in assigning the `sum..` variables to the z array, causing > the function MatMultAdd_SeqBAIJ_11() to not produce the correct > answer. > > I don't have a good example program to demonstrate this yet - it's > currently causing problems in a dev branch of pflotan_ogs that > can produce blocksize 11 matrices. When in parallel, a standard > matrix-vector multiplication calls MatMultAdd for the off-proc > contributions, and the result is wrong when this is redirected > to MatMultAdd_SeqBAIJ_11. Seems to be the root cause of > several solvers failing such as fgmres. > > Can anyone confirm that these two lines seem incorrect? Looks wrong to me, I guess this patch is correct? diff --git a/src/mat/impls/baij/seq/baij2.c b/src/mat/impls/baij/seq/baij2.c index 2849ef9051..65513c8989 100644 --- a/src/mat/impls/baij/seq/baij2.c +++ b/src/mat/impls/baij/seq/baij2.c @@ -2328,7 +2328,7 @@ PetscErrorCode MatMultAdd_SeqBAIJ_11(Mat A,Vec xx,Vec yy,Vec zz) v += 121; } z[0] = sum1; z[1] = sum2; z[2] = sum3; z[3] = sum4; z[4] = sum5; z[5] = sum6; z[6] = sum7; - z[7] = sum6; z[8] = sum7; z[9] = sum8; z[10] = sum9; z[11] = sum10; + z[7] = sum8; z[8] = sum9; z[9] = sum10; z[10] = sum11; if (!usecprow) { z += 11; y += 11; } Lawrence From daniel.stone at opengosim.com Tue Sep 21 08:51:47 2021 From: daniel.stone at opengosim.com (Daniel Stone) Date: Tue, 21 Sep 2021 14:51:47 +0100 Subject: [petsc-users] Is this a bug? MatMultAdd_SeqBAIJ_11 In-Reply-To: References: Message-ID: I seem to have confirmed that making the change suggested by Lawrence fixes things in our case. Some alternate runs by a colleague that result in a blocksize 12 matrix also work fine - I think in that case MAtMultAdd_SeqBAIJ_N must be being used as I can't find a blocksize 12 analogue. Is there by any chance a setting somewhere that can tell petsc to override the use of blocksize specific routines like this and always go to, e.g., MatMultAdd_SeqBAIJ_N etc? Might be useful as a short term fix. Thanks, Daniel On Tue, Sep 21, 2021 at 11:50 AM Lawrence Mitchell wrote: > Hi Daniel, > > > On 21 Sep 2021, at 11:19, Daniel Stone > wrote: > > > > Hello, > > > > If we look at lines 2330-2331 in file baij2.c, it looks like there are > some > > mistakes in assigning the `sum..` variables to the z array, causing > > the function MatMultAdd_SeqBAIJ_11() to not produce the correct > > answer. > > > > I don't have a good example program to demonstrate this yet - it's > > currently causing problems in a dev branch of pflotan_ogs that > > can produce blocksize 11 matrices. When in parallel, a standard > > matrix-vector multiplication calls MatMultAdd for the off-proc > > contributions, and the result is wrong when this is redirected > > to MatMultAdd_SeqBAIJ_11. Seems to be the root cause of > > several solvers failing such as fgmres. > > > > Can anyone confirm that these two lines seem incorrect? > > Looks wrong to me, I guess this patch is correct? > > diff --git a/src/mat/impls/baij/seq/baij2.c > b/src/mat/impls/baij/seq/baij2.c > index 2849ef9051..65513c8989 100644 > --- a/src/mat/impls/baij/seq/baij2.c > +++ b/src/mat/impls/baij/seq/baij2.c > @@ -2328,7 +2328,7 @@ PetscErrorCode MatMultAdd_SeqBAIJ_11(Mat A,Vec > xx,Vec yy,Vec zz) > v += 121; > } > z[0] = sum1; z[1] = sum2; z[2] = sum3; z[3] = sum4; z[4] = sum5; z[5] > = sum6; z[6] = sum7; > - z[7] = sum6; z[8] = sum7; z[9] = sum8; z[10] = sum9; z[11] = sum10; > + z[7] = sum8; z[8] = sum9; z[9] = sum10; z[10] = sum11; > if (!usecprow) { > z += 11; y += 11; > } > > > Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Sep 21 09:02:33 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 21 Sep 2021 10:02:33 -0400 Subject: [petsc-users] Is this a bug? MatMultAdd_SeqBAIJ_11 In-Reply-To: References: Message-ID: Daniel, Thanks for the update. We can patch the release version of PETSc (normally we don't have a way to patch previous release versions) There is no direct option to always use the _N version. (For smaller blocks using _N is very slow). Barry > On Sep 21, 2021, at 9:51 AM, Daniel Stone wrote: > > I seem to have confirmed that making the change suggested by Lawrence fixes things in our > case. Some alternate runs by a colleague that result in a blocksize 12 matrix also work > fine - I think in that case MAtMultAdd_SeqBAIJ_N must be being used as I can't find > a blocksize 12 analogue. > > Is there by any chance a setting somewhere that can tell petsc to override the use of > blocksize specific routines like this and always go to, e.g., MatMultAdd_SeqBAIJ_N > etc? Might be useful as a short term fix. > > Thanks, > > Daniel > > On Tue, Sep 21, 2021 at 11:50 AM Lawrence Mitchell > wrote: > Hi Daniel, > > > On 21 Sep 2021, at 11:19, Daniel Stone > wrote: > > > > Hello, > > > > If we look at lines 2330-2331 in file baij2.c, it looks like there are some > > mistakes in assigning the `sum..` variables to the z array, causing > > the function MatMultAdd_SeqBAIJ_11() to not produce the correct > > answer. > > > > I don't have a good example program to demonstrate this yet - it's > > currently causing problems in a dev branch of pflotan_ogs that > > can produce blocksize 11 matrices. When in parallel, a standard > > matrix-vector multiplication calls MatMultAdd for the off-proc > > contributions, and the result is wrong when this is redirected > > to MatMultAdd_SeqBAIJ_11. Seems to be the root cause of > > several solvers failing such as fgmres. > > > > Can anyone confirm that these two lines seem incorrect? > > Looks wrong to me, I guess this patch is correct? > > diff --git a/src/mat/impls/baij/seq/baij2.c b/src/mat/impls/baij/seq/baij2.c > index 2849ef9051..65513c8989 100644 > --- a/src/mat/impls/baij/seq/baij2.c > +++ b/src/mat/impls/baij/seq/baij2.c > @@ -2328,7 +2328,7 @@ PetscErrorCode MatMultAdd_SeqBAIJ_11(Mat A,Vec xx,Vec yy,Vec zz) > v += 121; > } > z[0] = sum1; z[1] = sum2; z[2] = sum3; z[3] = sum4; z[4] = sum5; z[5] = sum6; z[6] = sum7; > - z[7] = sum6; z[8] = sum7; z[9] = sum8; z[10] = sum9; z[11] = sum10; > + z[7] = sum8; z[8] = sum9; z[9] = sum10; z[10] = sum11; > if (!usecprow) { > z += 11; y += 11; > } > > > Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Sep 21 09:05:15 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 21 Sep 2021 10:05:15 -0400 Subject: [petsc-users] Is this a bug? MatMultAdd_SeqBAIJ_11 In-Reply-To: References: Message-ID: On Tue, Sep 21, 2021 at 10:04 AM Barry Smith wrote: > > Daniel, > > Thanks for the update. We can patch the release version of PETSc > (normally we don't have a way to patch previous release versions) > > There is no direct option to always use the _N version. (For smaller > blocks using _N is very slow). > It is in an MR: https://gitlab.com/petsc/petsc/-/merge_requests/4338 Thanks, Matt > Barry > > > On Sep 21, 2021, at 9:51 AM, Daniel Stone > wrote: > > I seem to have confirmed that making the change suggested by Lawrence > fixes things in our > case. Some alternate runs by a colleague that result in a blocksize 12 > matrix also work > fine - I think in that case MAtMultAdd_SeqBAIJ_N must be being used as I > can't find > a blocksize 12 analogue. > > Is there by any chance a setting somewhere that can tell petsc to override > the use of > blocksize specific routines like this and always go to, e.g., > MatMultAdd_SeqBAIJ_N > etc? Might be useful as a short term fix. > > Thanks, > > Daniel > > On Tue, Sep 21, 2021 at 11:50 AM Lawrence Mitchell wrote: > >> Hi Daniel, >> >> > On 21 Sep 2021, at 11:19, Daniel Stone >> wrote: >> > >> > Hello, >> > >> > If we look at lines 2330-2331 in file baij2.c, it looks like there are >> some >> > mistakes in assigning the `sum..` variables to the z array, causing >> > the function MatMultAdd_SeqBAIJ_11() to not produce the correct >> > answer. >> > >> > I don't have a good example program to demonstrate this yet - it's >> > currently causing problems in a dev branch of pflotan_ogs that >> > can produce blocksize 11 matrices. When in parallel, a standard >> > matrix-vector multiplication calls MatMultAdd for the off-proc >> > contributions, and the result is wrong when this is redirected >> > to MatMultAdd_SeqBAIJ_11. Seems to be the root cause of >> > several solvers failing such as fgmres. >> > >> > Can anyone confirm that these two lines seem incorrect? >> >> Looks wrong to me, I guess this patch is correct? >> >> diff --git a/src/mat/impls/baij/seq/baij2.c >> b/src/mat/impls/baij/seq/baij2.c >> index 2849ef9051..65513c8989 100644 >> --- a/src/mat/impls/baij/seq/baij2.c >> +++ b/src/mat/impls/baij/seq/baij2.c >> @@ -2328,7 +2328,7 @@ PetscErrorCode MatMultAdd_SeqBAIJ_11(Mat A,Vec >> xx,Vec yy,Vec zz) >> v += 121; >> } >> z[0] = sum1; z[1] = sum2; z[2] = sum3; z[3] = sum4; z[4] = sum5; >> z[5] = sum6; z[6] = sum7; >> - z[7] = sum6; z[8] = sum7; z[9] = sum8; z[10] = sum9; z[11] = sum10; >> + z[7] = sum8; z[8] = sum9; z[9] = sum10; z[10] = sum11; >> if (!usecprow) { >> z += 11; y += 11; >> } >> >> >> Lawrence > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Tue Sep 21 09:30:49 2021 From: niko.karin at gmail.com (Karin&NiKo) Date: Tue, 21 Sep 2021 16:30:49 +0200 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: References: Message-ID: Dear Eric, dear Matthew, I share Eric's desire to be able to manipulate meshes composed of different types of elements in a PETSc's DMPlex. Since this discussion, is there anything new on this feature for the DMPlex object or am I missing something? Thanks, Nicolas Le mer. 21 juil. 2021 ? 04:25, Eric Chamberland < Eric.Chamberland at giref.ulaval.ca> a ?crit : > Hi, > On 2021-07-14 3:14 p.m., Matthew Knepley wrote: > > On Wed, Jul 14, 2021 at 1:25 PM Eric Chamberland < > Eric.Chamberland at giref.ulaval.ca> wrote: > >> Hi, >> >> while playing with DMPlexBuildFromCellListParallel, I noticed we have to >> specify "numCorners" which is a fixed value, then gives a fixed number >> of nodes for a series of elements. >> >> How can I then add, for example, triangles and quadrangles into a DMPlex? >> > > You can't with that function. It would be much mich more complicated if > you could, and I am not sure > it is worth it for that function. The reason is that you would need index > information to offset into the > connectivity list, and that would need to be replicated to some extent so > that all processes know what > the others are doing. Possible, but complicated. > > Maybe I can help suggest something for what you are trying to do? > > Yes: we are trying to partition our parallel mesh with PETSc functions. > The mesh has been read in parallel so each process owns a part of it, but > we have to manage mixed elements types. > > When we directly use ParMETIS_V3_PartMeshKway, we give two arrays to > describe the elements which allows mixed elements. > > So, how would I read my mixed mesh in parallel and give it to PETSc DMPlex > so I can use a PetscPartitioner with DMPlexDistribute ? > > A second goal we have is to use PETSc to compute the overlap, which is > something I can't find in PARMetis (and any other partitionning library?) > > Thanks, > > Eric > > > > Thanks, > > Matt > > > >> Thanks, >> >> Eric >> >> -- >> Eric Chamberland, ing., M. Ing >> Professionnel de recherche >> GIREF/Universit? Laval >> (418) 656-2131 poste 41 22 42 >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Universit? Laval > (418) 656-2131 poste 41 22 42 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Sep 21 13:14:57 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 21 Sep 2021 20:14:57 +0200 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> Message-ID: <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> I will have a look at your code when I have more time. Meanwhile, I am answering 3) below... > El 21 sept 2021, a las 0:23, Varun Hiremath escribi?: > > Hi Jose, > > Sorry, it took me a while to test these settings in the new builds. I am getting good improvement in performance using the preconditioned solvers, so thanks for the suggestions! But I have some questions related to the usage. > > We are using SLEPc to solve the acoustic modal eigenvalue problem. Attached is a simple standalone program that computes acoustic modes in a simple rectangular box. This program illustrates the general setup I am using, though here the shell matrix and the preconditioner matrix are the same, while in my actual program the shell matrix computes A*x without explicitly forming A, and the preconditioner is a 0th order approximation of A. > > In the attached program I have tested both > 1) the Krylov-Schur with inexact shift-and-invert (implemented under the option sinvert); > 2) the JD solver with preconditioner (implemented under the option usejd) > > Both the solvers seem to work decently, compared to no preconditioning. This is how I run the two solvers (for a mesh size of 1600x400): > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 > $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 -eps_target 0 > Both finish in about ~10 minutes on my system in serial. JD seems to be slightly faster and more accurate (for the imaginary part of eigenvalue). > The program also runs in parallel using mpiexec. I use complex builds, as in my main program the matrix can be complex. > > Now here are my questions: > 1) For this particular problem type, could you please check if these are the best settings that one could use? I have tried different combinations of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI seems to work the best in serial and parallel. > > 2) When I tested these settings in my main program, for some reason the JD solver was not converging. After further testing, I found the issue was related to the setting of "-eps_target 0". I have included "EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to passing "-eps_target 0" from the command line, but that doesn't seem to be the case. For instance, if I run the attached program without "-eps_target 0" in the command line then it doesn't converge. > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 > the above finishes in about 10 minutes > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > the above doesn't converge even though "EPSSetTarget(eps,0.0);" is included in the code > > This only seems to affect the JD solver, not the Krylov shift-and-invert (-sinvert 1) option. So is there any difference between passing "-eps_target 0" from the command line vs using "EPSSetTarget(eps,0.0);" in the code? I cannot pass any command line arguments in my actual program, so need to set everything internally. > > 3) Also, another minor related issue. While using the inexact shift-and-invert option, I was running into the following error: > > "" > Missing or incorrect user input > Shift-and-invert requires a target 'which' (see EPSSetWhichEigenpairs), for instance -st_type sinvert -eps_target 0 -eps_target_magnitude > "" > > I already have the below two lines in the code: > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > EPSSetTarget(eps,0.0); > > so shouldn't these be enough? If I comment out the first line "EPSSetWhichEigenpairs", then the code works fine. You should either do EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); without shift-and-invert or EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); EPSSetTarget(eps,0.0); with shift-and-invert. The latter can also be used without shift-and-invert (e.g. in JD). I have to check, but a possible explanation why in your comment above (2) the command-line option -eps_target 0 works differently is that it also sets -eps_target_magnitude if omitted, so to be equivalent in source code you have to call both EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); EPSSetTarget(eps,0.0); Jose > I have some more questions regarding setting the preconditioner for a quadratic eigenvalue problem, which I will ask in a follow-up email. > > Thanks for your help! > > -Varun > > > On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath wrote: > Thank you very much for these suggestions! We are currently using version 3.12, so I'll try to update to the latest version and try your suggestions. Let me get back to you, thanks! > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman wrote: > Then I would try Davidson methods https://doi.org/10.1145/2543696 > You can also try Krylov-Schur with "inexact" shift-and-invert, for instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the users manual. > > In both cases, you have to pass matrix A in the call to EPSSetOperators() and the preconditioner matrix via STSetPreconditionerMat() - note this function was introduced in version 3.15. > > Jose > > > > > El 1 jul 2021, a las 13:36, Varun Hiremath escribi?: > > > > Thanks. I actually do have a 1st order approximation of matrix A, that I can explicitly compute and also invert. Can I use that matrix as preconditioner to speed things up? Is there some example that explains how to setup and call SLEPc for this scenario? > > > > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman wrote: > > For smallest real parts one could adapt ex34.c, but it is going to be costly https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html > > Also, if eigenvalues are clustered around the origin, convergence may still be very slow. > > > > It is a tough problem, unless you are able to compute a good preconditioner of A (no need to compute the exact inverse). > > > > Jose > > > > > > > El 1 jul 2021, a las 13:23, Varun Hiremath escribi?: > > > > > > I'm solving for the smallest eigenvalues in magnitude. Though is it cheaper to solve smallest in real part, as that might also work in my case? Thanks for your help. > > > > > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman wrote: > > > Smallest eigenvalue in magnitude or real part? > > > > > > > > > > El 1 jul 2021, a las 11:58, Varun Hiremath escribi?: > > > > > > > > Sorry, no both A and B are general sparse matrices (non-hermitian). So is there anything else I could try? > > > > > > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman wrote: > > > > Is the problem symmetric (GHEP)? In that case, you can try LOBPCG on the pair (A,B). But this will likely be slow as well, unless you can provide a good preconditioner. > > > > > > > > Jose > > > > > > > > > > > > > El 1 jul 2021, a las 11:37, Varun Hiremath escribi?: > > > > > > > > > > Hi All, > > > > > > > > > > I am trying to compute the smallest eigenvalues of a generalized system A*x= lambda*B*x. I don't explicitly know the matrix A (so I am using a shell matrix with a custom matmult function) however, the matrix B is explicitly known so I compute inv(B)*A within the shell matrix and solve inv(B)*A*x = lambda*x. > > > > > > > > > > To compute the smallest eigenvalues it is recommended to solve the inverted system, but since matrix A is not explicitly known I can't invert the system. Moreover, the size of the system can be really big, and with the default Krylov solver, it is extremely slow. So is there a better way for me to compute the smallest eigenvalues of this system? > > > > > > > > > > Thanks, > > > > > Varun > > > > > > > > > > > From knepley at gmail.com Tue Sep 21 14:55:56 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 21 Sep 2021 15:55:56 -0400 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: References: Message-ID: On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo wrote: > Dear Eric, dear Matthew, > > I share Eric's desire to be able to manipulate meshes composed of > different types of elements in a PETSc's DMPlex. > Since this discussion, is there anything new on this feature for the > DMPlex object or am I missing something? > Thanks for finding this! Okay, I did a rewrite of the Plex internals this summer. It should now be possible to interpolate a mesh with any number of cell types, partition it, redistribute it, and many other manipulations. You can read in some formats that support hybrid meshes. If you let me know how you plan to read it in, we can make it work. Right now, I don't want to make input interfaces that no one will ever use. We have a project, joint with Firedrake, to finalize parallel I/O. This will make parallel reading and writing for checkpointing possible, supporting topology, geometry, fields and layouts, for many meshes in one HDF5 file. I think we will finish in November. Thanks, Matt > Thanks, > Nicolas > > Le mer. 21 juil. 2021 ? 04:25, Eric Chamberland < > Eric.Chamberland at giref.ulaval.ca> a ?crit : > >> Hi, >> On 2021-07-14 3:14 p.m., Matthew Knepley wrote: >> >> On Wed, Jul 14, 2021 at 1:25 PM Eric Chamberland < >> Eric.Chamberland at giref.ulaval.ca> wrote: >> >>> Hi, >>> >>> while playing with DMPlexBuildFromCellListParallel, I noticed we have to >>> specify "numCorners" which is a fixed value, then gives a fixed number >>> of nodes for a series of elements. >>> >>> How can I then add, for example, triangles and quadrangles into a DMPlex? >>> >> >> You can't with that function. It would be much mich more complicated if >> you could, and I am not sure >> it is worth it for that function. The reason is that you would need index >> information to offset into the >> connectivity list, and that would need to be replicated to some extent so >> that all processes know what >> the others are doing. Possible, but complicated. >> >> Maybe I can help suggest something for what you are trying to do? >> >> Yes: we are trying to partition our parallel mesh with PETSc functions. >> The mesh has been read in parallel so each process owns a part of it, but >> we have to manage mixed elements types. >> >> When we directly use ParMETIS_V3_PartMeshKway, we give two arrays to >> describe the elements which allows mixed elements. >> >> So, how would I read my mixed mesh in parallel and give it to PETSc >> DMPlex so I can use a PetscPartitioner with DMPlexDistribute ? >> >> A second goal we have is to use PETSc to compute the overlap, which is >> something I can't find in PARMetis (and any other partitionning library?) >> >> Thanks, >> >> Eric >> >> >> >> Thanks, >> >> Matt >> >> >> >>> Thanks, >>> >>> Eric >>> >>> -- >>> Eric Chamberland, ing., M. Ing >>> Professionnel de recherche >>> GIREF/Universit? Laval >>> (418) 656-2131 poste 41 22 42 >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> Eric Chamberland, ing., M. Ing >> Professionnel de recherche >> GIREF/Universit? Laval >> (418) 656-2131 poste 41 22 42 >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From niko.karin at gmail.com Wed Sep 22 02:04:35 2021 From: niko.karin at gmail.com (Karin&NiKo) Date: Wed, 22 Sep 2021 09:04:35 +0200 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: References: Message-ID: Dear Matthew, This is great news! For my part, I would be mostly interested in the parallel input interface. Sorry for that... Indeed, in our application, we already have a parallel mesh data structure that supports hybrid meshes with parallel I/O and distribution (based on the MED format). We would like to use a DMPlex to make parallel mesh adaptation. As a matter of fact, all our meshes are in the MED format. We could also contribute to extend the interface of DMPlex with MED (if you consider it could be usefull). Best regards, Nicolas Le mar. 21 sept. 2021 ? 21:56, Matthew Knepley a ?crit : > On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo wrote: > >> Dear Eric, dear Matthew, >> >> I share Eric's desire to be able to manipulate meshes composed of >> different types of elements in a PETSc's DMPlex. >> Since this discussion, is there anything new on this feature for the >> DMPlex object or am I missing something? >> > > Thanks for finding this! > > Okay, I did a rewrite of the Plex internals this summer. It should now be > possible to interpolate a mesh with any > number of cell types, partition it, redistribute it, and many other > manipulations. > > You can read in some formats that support hybrid meshes. If you let me > know how you plan to read it in, we can make it work. > Right now, I don't want to make input interfaces that no one will ever > use. We have a project, joint with Firedrake, to finalize > parallel I/O. This will make parallel reading and writing for > checkpointing possible, supporting topology, geometry, fields and > layouts, for many meshes in one HDF5 file. I think we will finish in > November. > > Thanks, > > Matt > > >> Thanks, >> Nicolas >> >> Le mer. 21 juil. 2021 ? 04:25, Eric Chamberland < >> Eric.Chamberland at giref.ulaval.ca> a ?crit : >> >>> Hi, >>> On 2021-07-14 3:14 p.m., Matthew Knepley wrote: >>> >>> On Wed, Jul 14, 2021 at 1:25 PM Eric Chamberland < >>> Eric.Chamberland at giref.ulaval.ca> wrote: >>> >>>> Hi, >>>> >>>> while playing with DMPlexBuildFromCellListParallel, I noticed we have >>>> to >>>> specify "numCorners" which is a fixed value, then gives a fixed number >>>> of nodes for a series of elements. >>>> >>>> How can I then add, for example, triangles and quadrangles into a >>>> DMPlex? >>>> >>> >>> You can't with that function. It would be much mich more complicated if >>> you could, and I am not sure >>> it is worth it for that function. The reason is that you would need >>> index information to offset into the >>> connectivity list, and that would need to be replicated to some extent >>> so that all processes know what >>> the others are doing. Possible, but complicated. >>> >>> Maybe I can help suggest something for what you are trying to do? >>> >>> Yes: we are trying to partition our parallel mesh with PETSc functions. >>> The mesh has been read in parallel so each process owns a part of it, but >>> we have to manage mixed elements types. >>> >>> When we directly use ParMETIS_V3_PartMeshKway, we give two arrays to >>> describe the elements which allows mixed elements. >>> >>> So, how would I read my mixed mesh in parallel and give it to PETSc >>> DMPlex so I can use a PetscPartitioner with DMPlexDistribute ? >>> >>> A second goal we have is to use PETSc to compute the overlap, which is >>> something I can't find in PARMetis (and any other partitionning library?) >>> >>> Thanks, >>> >>> Eric >>> >>> >>> >>> Thanks, >>> >>> Matt >>> >>> >>> >>>> Thanks, >>>> >>>> Eric >>>> >>>> -- >>>> Eric Chamberland, ing., M. Ing >>>> Professionnel de recherche >>>> GIREF/Universit? Laval >>>> (418) 656-2131 poste 41 22 42 >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> -- >>> Eric Chamberland, ing., M. Ing >>> Professionnel de recherche >>> GIREF/Universit? Laval >>> (418) 656-2131 poste 41 22 42 >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 22 06:20:29 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 Sep 2021 07:20:29 -0400 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: References: Message-ID: On Wed, Sep 22, 2021 at 3:04 AM Karin&NiKo wrote: > Dear Matthew, > > This is great news! > For my part, I would be mostly interested in the parallel input interface. > Sorry for that... > Indeed, in our application, we already have a parallel mesh data > structure that supports hybrid meshes with parallel I/O and distribution > (based on the MED format). We would like to use a DMPlex to make parallel > mesh adaptation. > As a matter of fact, all our meshes are in the MED format. We could > also contribute to extend the interface of DMPlex with MED (if you consider > it could be usefull). > An MED interface does exist. I stopped using it for two reasons: 1) The code was not portable and the build was failing on different architectures. I had to manually fix it. 2) The boundary markers did not provide global information, so that parallel reading was much harder. Feel free to update my MED reader to a better design. Thanks, Matt > Best regards, > Nicolas > > > Le mar. 21 sept. 2021 ? 21:56, Matthew Knepley a > ?crit : > >> On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo wrote: >> >>> Dear Eric, dear Matthew, >>> >>> I share Eric's desire to be able to manipulate meshes composed of >>> different types of elements in a PETSc's DMPlex. >>> Since this discussion, is there anything new on this feature for the >>> DMPlex object or am I missing something? >>> >> >> Thanks for finding this! >> >> Okay, I did a rewrite of the Plex internals this summer. It should now be >> possible to interpolate a mesh with any >> number of cell types, partition it, redistribute it, and many other >> manipulations. >> >> You can read in some formats that support hybrid meshes. If you let me >> know how you plan to read it in, we can make it work. >> Right now, I don't want to make input interfaces that no one will ever >> use. We have a project, joint with Firedrake, to finalize >> parallel I/O. This will make parallel reading and writing for >> checkpointing possible, supporting topology, geometry, fields and >> layouts, for many meshes in one HDF5 file. I think we will finish in >> November. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Nicolas >>> >>> Le mer. 21 juil. 2021 ? 04:25, Eric Chamberland < >>> Eric.Chamberland at giref.ulaval.ca> a ?crit : >>> >>>> Hi, >>>> On 2021-07-14 3:14 p.m., Matthew Knepley wrote: >>>> >>>> On Wed, Jul 14, 2021 at 1:25 PM Eric Chamberland < >>>> Eric.Chamberland at giref.ulaval.ca> wrote: >>>> >>>>> Hi, >>>>> >>>>> while playing with DMPlexBuildFromCellListParallel, I noticed we have >>>>> to >>>>> specify "numCorners" which is a fixed value, then gives a fixed number >>>>> of nodes for a series of elements. >>>>> >>>>> How can I then add, for example, triangles and quadrangles into a >>>>> DMPlex? >>>>> >>>> >>>> You can't with that function. It would be much mich more complicated if >>>> you could, and I am not sure >>>> it is worth it for that function. The reason is that you would need >>>> index information to offset into the >>>> connectivity list, and that would need to be replicated to some extent >>>> so that all processes know what >>>> the others are doing. Possible, but complicated. >>>> >>>> Maybe I can help suggest something for what you are trying to do? >>>> >>>> Yes: we are trying to partition our parallel mesh with PETSc >>>> functions. The mesh has been read in parallel so each process owns a part >>>> of it, but we have to manage mixed elements types. >>>> >>>> When we directly use ParMETIS_V3_PartMeshKway, we give two arrays to >>>> describe the elements which allows mixed elements. >>>> >>>> So, how would I read my mixed mesh in parallel and give it to PETSc >>>> DMPlex so I can use a PetscPartitioner with DMPlexDistribute ? >>>> >>>> A second goal we have is to use PETSc to compute the overlap, which is >>>> something I can't find in PARMetis (and any other partitionning library?) >>>> >>>> Thanks, >>>> >>>> Eric >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>> >>>>> Thanks, >>>>> >>>>> Eric >>>>> >>>>> -- >>>>> Eric Chamberland, ing., M. Ing >>>>> Professionnel de recherche >>>>> GIREF/Universit? Laval >>>>> (418) 656-2131 poste 41 22 42 >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> Eric Chamberland, ing., M. Ing >>>> Professionnel de recherche >>>> GIREF/Universit? Laval >>>> (418) 656-2131 poste 41 22 42 >>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From varunhiremath at gmail.com Wed Sep 22 12:38:27 2021 From: varunhiremath at gmail.com (Varun Hiremath) Date: Wed, 22 Sep 2021 10:38:27 -0700 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> Message-ID: Hi Jose, Thank you, that explains it and my example code works now without specifying "-eps_target 0" in the command line. However, both the Krylov inexact shift-invert and JD solvers are struggling to converge for some of my actual problems. The issue seems to be related to non-symmetric general matrices. I have extracted one such matrix attached here as MatA.gz (size 100k), and have also included a short program that loads this matrix and then computes the smallest eigenvalues as I described earlier. For this matrix, if I compute the eigenvalues directly (without using the shell matrix) using shift-and-invert (as below) then it converges in less than a minute. $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 However, if I use the shell matrix and use any of the preconditioned solvers JD or Krylov shift-invert (as shown below) with the same matrix as the preconditioner, then they struggle to converge. $ ./acoustic_matrix_test.o -usejd 1 -deflate 1 $ ./acoustic_matrix_test.o -sinvert 1 -deflate 1 Could you please check the attached code and suggest any changes in settings that might help with convergence for these kinds of matrices? I appreciate your help! Thanks, Varun On Tue, Sep 21, 2021 at 11:14 AM Jose E. Roman wrote: > I will have a look at your code when I have more time. Meanwhile, I am > answering 3) below... > > > El 21 sept 2021, a las 0:23, Varun Hiremath > escribi?: > > > > Hi Jose, > > > > Sorry, it took me a while to test these settings in the new builds. I am > getting good improvement in performance using the preconditioned solvers, > so thanks for the suggestions! But I have some questions related to the > usage. > > > > We are using SLEPc to solve the acoustic modal eigenvalue problem. > Attached is a simple standalone program that computes acoustic modes in a > simple rectangular box. This program illustrates the general setup I am > using, though here the shell matrix and the preconditioner matrix are the > same, while in my actual program the shell matrix computes A*x without > explicitly forming A, and the preconditioner is a 0th order approximation > of A. > > > > In the attached program I have tested both > > 1) the Krylov-Schur with inexact shift-and-invert (implemented under the > option sinvert); > > 2) the JD solver with preconditioner (implemented under the option usejd) > > > > Both the solvers seem to work decently, compared to no preconditioning. > This is how I run the two solvers (for a mesh size of 1600x400): > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target > 0 > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 > -eps_target 0 > > Both finish in about ~10 minutes on my system in serial. JD seems to be > slightly faster and more accurate (for the imaginary part of eigenvalue). > > The program also runs in parallel using mpiexec. I use complex builds, > as in my main program the matrix can be complex. > > > > Now here are my questions: > > 1) For this particular problem type, could you please check if these are > the best settings that one could use? I have tried different combinations > of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI seems to work > the best in serial and parallel. > > > > 2) When I tested these settings in my main program, for some reason the > JD solver was not converging. After further testing, I found the issue was > related to the setting of "-eps_target 0". I have included > "EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to > passing "-eps_target 0" from the command line, but that doesn't seem to be > the case. For instance, if I run the attached program without "-eps_target > 0" in the command line then it doesn't converge. > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target > 0 > > the above finishes in about 10 minutes > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > > the above doesn't converge even though "EPSSetTarget(eps,0.0);" is > included in the code > > > > This only seems to affect the JD solver, not the Krylov shift-and-invert > (-sinvert 1) option. So is there any difference between passing > "-eps_target 0" from the command line vs using "EPSSetTarget(eps,0.0);" in > the code? I cannot pass any command line arguments in my actual program, so > need to set everything internally. > > > > 3) Also, another minor related issue. While using the inexact > shift-and-invert option, I was running into the following error: > > > > "" > > Missing or incorrect user input > > Shift-and-invert requires a target 'which' (see EPSSetWhichEigenpairs), > for instance -st_type sinvert -eps_target 0 -eps_target_magnitude > > "" > > > > I already have the below two lines in the code: > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > EPSSetTarget(eps,0.0); > > > > so shouldn't these be enough? If I comment out the first line > "EPSSetWhichEigenpairs", then the code works fine. > > You should either do > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > without shift-and-invert or > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > EPSSetTarget(eps,0.0); > > with shift-and-invert. The latter can also be used without > shift-and-invert (e.g. in JD). > > I have to check, but a possible explanation why in your comment above (2) > the command-line option -eps_target 0 works differently is that it also > sets -eps_target_magnitude if omitted, so to be equivalent in source code > you have to call both > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > EPSSetTarget(eps,0.0); > > Jose > > > I have some more questions regarding setting the preconditioner for a > quadratic eigenvalue problem, which I will ask in a follow-up email. > > > > Thanks for your help! > > > > -Varun > > > > > > On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath > wrote: > > Thank you very much for these suggestions! We are currently using > version 3.12, so I'll try to update to the latest version and try your > suggestions. Let me get back to you, thanks! > > > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman wrote: > > Then I would try Davidson methods https://doi.org/10.1145/2543696 > > You can also try Krylov-Schur with "inexact" shift-and-invert, for > instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the > users manual. > > > > In both cases, you have to pass matrix A in the call to > EPSSetOperators() and the preconditioner matrix via > STSetPreconditionerMat() - note this function was introduced in version > 3.15. > > > > Jose > > > > > > > > > El 1 jul 2021, a las 13:36, Varun Hiremath > escribi?: > > > > > > Thanks. I actually do have a 1st order approximation of matrix A, that > I can explicitly compute and also invert. Can I use that matrix as > preconditioner to speed things up? Is there some example that explains how > to setup and call SLEPc for this scenario? > > > > > > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman wrote: > > > For smallest real parts one could adapt ex34.c, but it is going to be > costly > https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html > > > Also, if eigenvalues are clustered around the origin, convergence may > still be very slow. > > > > > > It is a tough problem, unless you are able to compute a good > preconditioner of A (no need to compute the exact inverse). > > > > > > Jose > > > > > > > > > > El 1 jul 2021, a las 13:23, Varun Hiremath > escribi?: > > > > > > > > I'm solving for the smallest eigenvalues in magnitude. Though is it > cheaper to solve smallest in real part, as that might also work in my case? > Thanks for your help. > > > > > > > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman > wrote: > > > > Smallest eigenvalue in magnitude or real part? > > > > > > > > > > > > > El 1 jul 2021, a las 11:58, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > Sorry, no both A and B are general sparse matrices > (non-hermitian). So is there anything else I could try? > > > > > > > > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman > wrote: > > > > > Is the problem symmetric (GHEP)? In that case, you can try LOBPCG > on the pair (A,B). But this will likely be slow as well, unless you can > provide a good preconditioner. > > > > > > > > > > Jose > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:37, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > I am trying to compute the smallest eigenvalues of a generalized > system A*x= lambda*B*x. I don't explicitly know the matrix A (so I am using > a shell matrix with a custom matmult function) however, the matrix B is > explicitly known so I compute inv(B)*A within the shell matrix and solve > inv(B)*A*x = lambda*x. > > > > > > > > > > > > To compute the smallest eigenvalues it is recommended to solve > the inverted system, but since matrix A is not explicitly known I can't > invert the system. Moreover, the size of the system can be really big, and > with the default Krylov solver, it is extremely slow. So is there a better > way for me to compute the smallest eigenvalues of this system? > > > > > > > > > > > > Thanks, > > > > > > Varun > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: acoustic_matrix_test.cpp Type: application/octet-stream Size: 5467 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MatA.gz Type: application/x-gzip Size: 12596169 bytes Desc: not available URL: From vaclav.hapla at erdw.ethz.ch Wed Sep 22 13:59:30 2021 From: vaclav.hapla at erdw.ethz.ch (Hapla Vaclav) Date: Wed, 22 Sep 2021 18:59:30 +0000 Subject: [petsc-users] DMView and DMLoad In-Reply-To: References: <56ce2135-9757-4292-e33b-c7eea8fb7b2e@ovgu.de> Message-ID: <056E066F-D596-4254-A44A-60BFFD30FE82@erdw.ethz.ch> To avoid confusions here, Berend seems to be specifically demanding XDMF (PETSC_VIEWER_HDF5_XDMF). The stuff we are now working on is parallel checkpointing in our own HDF5 format (PETSC_VIEWER_HDF5_PETSC), I will make a series of MRs on this topic in the following days. For XDMF, we are specifically missing the ability to write/load DMLabels properly. XDMF uses specific cell-local numbering for faces for specification of face sets, and face-local numbering for specification of edge sets, which is not great wrt DMPlex design. And ParaView doesn't show any of these properly so it's hard to debug. Matt, we should talk about this soon. Berend, for now, could you just load the mesh initially from XDMF and then use our PETSC_VIEWER_HDF5_PETSC format for subsequent saving/loading? Thanks, Vaclav On 17 Sep 2021, at 15:46, Lawrence Mitchell > wrote: Hi Berend, On 14 Sep 2021, at 12:23, Matthew Knepley > wrote: On Tue, Sep 14, 2021 at 5:15 AM Berend van Wachem > wrote: Dear PETSc-team, We are trying to save and load distributed DMPlex and its associated physical fields (created with DMCreateGlobalVector) (Uvelocity, VVelocity, ...) in HDF5_XDMF format. To achieve this, we do the following: 1) save in the same xdmf.h5 file: DMView( DM , H5_XDMF_Viewer ); VecView( UVelocity, H5_XDMF_Viewer ); 2) load the dm: DMPlexCreateFromfile(PETSC_COMM_WORLD, Filename, PETSC_TRUE, DM); 3) load the physical field: VecLoad( UVelocity, H5_XDMF_Viewer ); There are no errors in the execution, but the loaded DM is distributed differently to the original one, which results in the incorrect placement of the values of the physical fields (UVelocity etc.) in the domain. This approach is used to restart the simulation with the last saved DM. Is there something we are missing, or there exists alternative routes to this goal? Can we somehow get the IS of the redistribution, so we can re-distribute the vector data as well? Many thanks, best regards, Hi Berend, We are in the midst of rewriting this. We want to support saving multiple meshes, with fields attached to each, and preserving the discretization (section) information, and allowing us to load up on a different number of processes. We plan to be done by October. Vaclav and I are doing this in collaboration with Koki Sagiyama, David Ham, and Lawrence Mitchell from the Firedrake team. The core load/save cycle functionality is now in PETSc main. So if you're using main rather than a release, you can get access to it now. This section of the manual shows an example of how to do things https://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5 Let us know if things aren't clear! Thanks, Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 23 07:56:58 2021 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 23 Sep 2021 08:56:58 -0400 Subject: [petsc-users] New error on Summit Message-ID: This was working before but now I get this (strange) error: Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_429f1/fast && /usr/bin/gmake -f CMakeFiles/cmTC_429f1.dir/build.make CMakeFiles/cmTC_429f1.dir/build gmake[1]: Entering directory '/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp' Building CXX object CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/bin/nvcc_wrapper -fPIC -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -O -+ -qPIC -std=gnu++14 -o CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o -c /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx nvcc_wrapper has been given GNU extension standard flag -std=gnu++14 - reverting flag to -std=c++14 nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). g++: error: unrecognized command line option ?~@~X-qPIC?~@~Y; did you mean ?~@~X-fPIC?~@~Y? gmake[1]: *** [CMakeFiles/cmTC_429f1.dir/build.make:78: CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o] Error 1 gmake[1]: Leaving directory '/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp' gmake: *** [Makefile:127: cmTC_429f1/fast] Error 2 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 1940254 bytes Desc: not available URL: From mfadams at lbl.gov Thu Sep 23 09:16:10 2021 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 23 Sep 2021 10:16:10 -0400 Subject: [petsc-users] New error on Summit In-Reply-To: References: Message-ID: This is fixed now. I did not have a gcc module loaded so it was picking up some default. On Thu, Sep 23, 2021 at 8:56 AM Mark Adams wrote: > This was working before but now I get this (strange) error: > > Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_429f1/fast && > /usr/bin/gmake -f CMakeFiles/cmTC_429f1.dir/build.make > CMakeFiles/cmTC_429f1.dir/build > gmake[1]: Entering directory > '/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp' > Building CXX object CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o > > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/bin/nvcc_wrapper > -fPIC -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -O -+ > -qPIC -std=gnu++14 -o CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o -c > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx > nvcc_wrapper has been given GNU extension standard flag -std=gnu++14 - > reverting flag to -std=c++14 > nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', > 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a > future release (Use -Wno-deprecated-gpu-targets to suppress warning). > g++: error: unrecognized command line option ?~@~X-qPIC?~@~Y; did you > mean ?~@~X-fPIC?~@~Y? > gmake[1]: *** [CMakeFiles/cmTC_429f1.dir/build.make:78: > CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o] Error 1 > gmake[1]: Leaving directory > '/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp' > gmake: *** [Makefile:127: cmTC_429f1/fast] Error 2 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 23 09:24:20 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 23 Sep 2021 10:24:20 -0400 Subject: [petsc-users] New error on Summit In-Reply-To: References: Message-ID: Not really fixed, the problem is still there. You just bypassed the problem that BuildSystem has. > On Sep 23, 2021, at 10:16 AM, Mark Adams wrote: > > This is fixed now. I did not have a gcc module loaded so it was picking up some default. > > On Thu, Sep 23, 2021 at 8:56 AM Mark Adams > wrote: > This was working before but now I get this (strange) error: > > Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_429f1/fast && /usr/bin/gmake -f CMakeFiles/cmTC_429f1.dir/build.make CMakeFiles/cmTC_429f1.dir/build > gmake[1]: Entering directory '/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp' > Building CXX object CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/bin/nvcc_wrapper -fPIC -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -O -+ -qPIC -std=gnu++14 -o CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o -c /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx > nvcc_wrapper has been given GNU extension standard flag -std=gnu++14 - reverting flag to -std=c++14 > nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). > g++: error: unrecognized command line option ?~@~X-qPIC?~@~Y; did you mean ?~@~X-fPIC?~@~Y? > gmake[1]: *** [CMakeFiles/cmTC_429f1.dir/build.make:78: CMakeFiles/cmTC_429f1.dir/testCXXCompiler.cxx.o] Error 1 > gmake[1]: Leaving directory '/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-cuda/externalpackages/git.kokkos/petsc-build/CMakeFiles/CMakeTmp' > gmake: *** [Makefile:127: cmTC_429f1/fast] Error 2 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Thu Sep 23 09:53:11 2021 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Thu, 23 Sep 2021 10:53:11 -0400 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: References: Message-ID: <6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca> Hi, oh, that's a great news! In our case we have our home-made file-format, invariant to the number of processes (thanks to MPI_File_set_view), that uses collective, asynchronous MPI I/O native calls for unstructured hybrid meshes and fields . So our needs are not for reading meshes but only to fill an hybrid DMPlex with DMPlexBuildFromCellListParallel (or something else to come?)... to exploit petsc partitioners and parallel overlap computation... Thanks for the follow-up! :) Eric On 2021-09-22 7:20 a.m., Matthew Knepley wrote: > On Wed, Sep 22, 2021 at 3:04 AM Karin&NiKo > wrote: > > Dear Matthew, > > This is great news! > For my part, I would be mostly interested?in the parallel input > interface. Sorry for that... > Indeed, in our application,? we already have a parallel mesh data > structure that supports hybrid meshes with parallel I/O and > distribution (based on the MED format). We would like to use a > DMPlex to make parallel mesh adaptation. > ?As a matter of fact, all our meshes are in the MED format. We > could also?contribute to extend the interface of DMPlex with MED > (if you consider it could be usefull). > > > An MED interface does exist. I stopped using it for two reasons: > > ? 1) The code was not portable and the build was failing on different > architectures. I had to manually fix it. > > ? 2) The boundary markers did not provide global information, so that > parallel reading was much harder. > > Feel free to update my MED reader to a better design. > > ? Thanks, > > ? ? ?Matt > > Best regards, > Nicolas > > > Le?mar. 21 sept. 2021 ??21:56, Matthew Knepley > a ?crit?: > > On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo > > wrote: > > Dear Eric, dear Matthew, > > I share Eric's desire to be able to manipulate meshes > composed of different types of elements in a PETSc's DMPlex. > Since this discussion, is there anything new on this > feature for the DMPlex?object or am I missing something? > > > Thanks for finding this! > > Okay, I did a rewrite of the Plex internals this summer. It > should now be possible to interpolate a mesh with any > number of cell types, partition it, redistribute it, and many > other manipulations. > > You can read in some formats that support hybrid?meshes. If > you let me know how you plan to read it in, we can make it work. > Right now, I don't want to make input interfaces that no one > will ever use. We have a project, joint with Firedrake, to > finalize > parallel I/O. This will make parallel reading and writing for > checkpointing possible, supporting topology, geometry, fields and > layouts, for many meshes?in one HDF5 file. I think we will > finish in November. > > ? Thanks, > > ? ? ?Matt > > Thanks, > Nicolas > > Le?mer. 21 juil. 2021 ??04:25, Eric Chamberland > > a ?crit?: > > Hi, > > On 2021-07-14 3:14 p.m., Matthew Knepley wrote: >> On Wed, Jul 14, 2021 at 1:25 PM Eric Chamberland >> > > wrote: >> >> Hi, >> >> while playing with >> DMPlexBuildFromCellListParallel, I noticed we >> have to >> specify "numCorners" which is a fixed value, then >> gives a fixed number >> of nodes for a series of elements. >> >> How can I then add, for example, triangles and >> quadrangles into a DMPlex? >> >> >> You can't with that function. It would be much mich >> more complicated if you could, and I am not sure >> it is worth it for that function. The reason is that >> you would need index information to offset?into the >> connectivity list, and that would need to be >> replicated to some extent so that all processes know what >> the others are doing. Possible, but complicated. >> >> Maybe I can help suggest something for what you are >> trying?to do? > > Yes: we are trying to partition our parallel mesh with > PETSc functions.? The mesh has been read in parallel > so each process owns a part of it, but we have to > manage mixed elements types. > > When we directly use ParMETIS_V3_PartMeshKway, we give > two arrays to describe the elements which allows mixed > elements. > > So, how would I read my mixed mesh in parallel and > give it to PETSc DMPlex so I can use a > PetscPartitioner with DMPlexDistribute ? > > A second goal we have is to use PETSc to compute the > overlap, which is something I can't find in PARMetis > (and any other partitionning library?) > > Thanks, > > Eric > > >> >> ? Thanks, >> >> ? ? ? Matt >> >> Thanks, >> >> Eric >> >> -- >> Eric Chamberland, ing., M. Ing >> Professionnel de recherche >> GIREF/Universit? Laval >> (418) 656-2131 poste 41 22 42 >> >> >> >> -- >> What most experimenters take for granted before they >> begin their experiments is infinitely more >> interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Universit? Laval > (418) 656-2131 poste 41 22 42 > > > > -- > What most experimenters take for granted before they begin > their experiments is infinitely more interesting than any > results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- Eric Chamberland, ing., M. Ing Professionnel de recherche GIREF/Universit? Laval (418) 656-2131 poste 41 22 42 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaclav.hapla at erdw.ethz.ch Thu Sep 23 10:30:45 2021 From: vaclav.hapla at erdw.ethz.ch (Hapla Vaclav) Date: Thu, 23 Sep 2021 15:30:45 +0000 Subject: [petsc-users] How to combine different element types into a single DMPlex? In-Reply-To: <6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca> References: <6e78845e-2054-92b1-d6db-2c0820c05b64@giref.ulaval.ca> Message-ID: Note there will soon be a generalization of DMPlexBuildFromCellListParallel() around, as a side product of our current collaborative efforts with Firedrake guys. It will take a PetscSection instead of relying on the blocksize [which is indeed always constant for the given dataset]. Stay tuned. https://gitlab.com/petsc/petsc/-/merge_requests/4350 Thanks, Vaclav On 23 Sep 2021, at 16:53, Eric Chamberland > wrote: Hi, oh, that's a great news! In our case we have our home-made file-format, invariant to the number of processes (thanks to MPI_File_set_view), that uses collective, asynchronous MPI I/O native calls for unstructured hybrid meshes and fields . So our needs are not for reading meshes but only to fill an hybrid DMPlex with DMPlexBuildFromCellListParallel (or something else to come?)... to exploit petsc partitioners and parallel overlap computation... Thanks for the follow-up! :) Eric On 2021-09-22 7:20 a.m., Matthew Knepley wrote: On Wed, Sep 22, 2021 at 3:04 AM Karin&NiKo > wrote: Dear Matthew, This is great news! For my part, I would be mostly interested in the parallel input interface. Sorry for that... Indeed, in our application, we already have a parallel mesh data structure that supports hybrid meshes with parallel I/O and distribution (based on the MED format). We would like to use a DMPlex to make parallel mesh adaptation. As a matter of fact, all our meshes are in the MED format. We could also contribute to extend the interface of DMPlex with MED (if you consider it could be usefull). An MED interface does exist. I stopped using it for two reasons: 1) The code was not portable and the build was failing on different architectures. I had to manually fix it. 2) The boundary markers did not provide global information, so that parallel reading was much harder. Feel free to update my MED reader to a better design. Thanks, Matt Best regards, Nicolas Le mar. 21 sept. 2021 ? 21:56, Matthew Knepley > a ?crit : On Tue, Sep 21, 2021 at 10:31 AM Karin&NiKo > wrote: Dear Eric, dear Matthew, I share Eric's desire to be able to manipulate meshes composed of different types of elements in a PETSc's DMPlex. Since this discussion, is there anything new on this feature for the DMPlex object or am I missing something? Thanks for finding this! Okay, I did a rewrite of the Plex internals this summer. It should now be possible to interpolate a mesh with any number of cell types, partition it, redistribute it, and many other manipulations. You can read in some formats that support hybrid meshes. If you let me know how you plan to read it in, we can make it work. Right now, I don't want to make input interfaces that no one will ever use. We have a project, joint with Firedrake, to finalize parallel I/O. This will make parallel reading and writing for checkpointing possible, supporting topology, geometry, fields and layouts, for many meshes in one HDF5 file. I think we will finish in November. Thanks, Matt Thanks, Nicolas Le mer. 21 juil. 2021 ? 04:25, Eric Chamberland > a ?crit : Hi, On 2021-07-14 3:14 p.m., Matthew Knepley wrote: On Wed, Jul 14, 2021 at 1:25 PM Eric Chamberland > wrote: Hi, while playing with DMPlexBuildFromCellListParallel, I noticed we have to specify "numCorners" which is a fixed value, then gives a fixed number of nodes for a series of elements. How can I then add, for example, triangles and quadrangles into a DMPlex? You can't with that function. It would be much mich more complicated if you could, and I am not sure it is worth it for that function. The reason is that you would need index information to offset into the connectivity list, and that would need to be replicated to some extent so that all processes know what the others are doing. Possible, but complicated. Maybe I can help suggest something for what you are trying to do? Yes: we are trying to partition our parallel mesh with PETSc functions. The mesh has been read in parallel so each process owns a part of it, but we have to manage mixed elements types. When we directly use ParMETIS_V3_PartMeshKway, we give two arrays to describe the elements which allows mixed elements. So, how would I read my mixed mesh in parallel and give it to PETSc DMPlex so I can use a PetscPartitioner with DMPlexDistribute ? A second goal we have is to use PETSc to compute the overlap, which is something I can't find in PARMetis (and any other partitionning library?) Thanks, Eric Thanks, Matt Thanks, Eric -- Eric Chamberland, ing., M. Ing Professionnel de recherche GIREF/Universit? Laval (418) 656-2131 poste 41 22 42 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- Eric Chamberland, ing., M. Ing Professionnel de recherche GIREF/Universit? Laval (418) 656-2131 poste 41 22 42 -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- Eric Chamberland, ing., M. Ing Professionnel de recherche GIREF/Universit? Laval (418) 656-2131 poste 41 22 42 -------------- next part -------------- An HTML attachment was scrubbed... URL: From medane.tchakorom at univ-fcomte.fr Fri Sep 24 09:08:01 2021 From: medane.tchakorom at univ-fcomte.fr (Medane TCHAKOROM) Date: Fri, 24 Sep 2021 16:08:01 +0200 Subject: [petsc-users] Petsc memory consumption keep increasing in my loop Message-ID: <48da1142-7ab9-e549-dc15-63a74dc093b2@univ-fcomte.fr> Hello, I have problem with a code i'am working on. To illustrate my problem, here is an example: int main(int argc, char *argv[]) { ??? PetscErrorCode ierr; ??? ierr = PetscInitialize(&argc, &argv, (char *)0, NULL); ??? if (ierr) ??????? return ierr; ??? int i = 0; ??? for (i = 0; i < 1; i++) ??? { ??????? Mat A; ??????? ierr = MatCreate(PETSC_COMM_WORLD, &A); ??????? CHKERRQ(ierr); ??????? ierr = MatSetSizes(A, 16, 16, PETSC_DECIDE,PETSC_DECIDE); ??????? CHKERRQ(ierr); ??????? ierr = MatSetFromOptions(A); ??????? CHKERRQ(ierr); ??????? ierr = MatSetUp(A); ??????? CHKERRQ(ierr); ??????? ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); ??????? CHKERRQ(ierr); ??????? ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); ??????? CHKERRQ(ierr); ??????? /// SOME CODE HERE.... ??????? MatDestroy(&A); ??? } ??? FILE *fPtr; ??? fPtr = fopen("petsc_dump_file.txt", "a"); ??? PetscMallocDump(fPtr); ??? fclose(fPtr); ??? ierr = PetscFinalize(); ??? CHKERRQ(ierr); ??? return 0; } The problem is , in the loop, the memory consumption keep increasing till the end of the program. I checked memory leak with PetscMallocDump, and found out that the problem may be due to matrix creation. I'am new to Petsc and i don't know if i'am doing something wrong. Thanks M?dane From bsmith at petsc.dev Fri Sep 24 09:13:59 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 24 Sep 2021 10:13:59 -0400 Subject: [petsc-users] Petsc memory consumption keep increasing in my loop In-Reply-To: <48da1142-7ab9-e549-dc15-63a74dc093b2@univ-fcomte.fr> References: <48da1142-7ab9-e549-dc15-63a74dc093b2@univ-fcomte.fr> Message-ID: <6D5C33D0-A02E-443F-8A7C-9ADBB89369CB@petsc.dev> The code you sent looks fine, it should not leak memory. Perhaps the /// SOME CODE HERE.... is doing something that prevents the matrix from being actually freed. PETSc uses reference counting on its objects so if another object keeps a reference to the matrix then the memory of the matrix will not be freed until the reference count drops back to zero. For example if a KSP has a reference to the matrix and the KSP has not been completely freed the matrix memory will remain. We would need to see the full code to understand why the matrix is not being freed. Barry > On Sep 24, 2021, at 10:08 AM, Medane TCHAKOROM wrote: > > Hello, > > I have problem with a code i'am working on. > > To illustrate my problem, here is an example: > > > int main(int argc, char *argv[]) > { > > PetscErrorCode ierr; > > ierr = PetscInitialize(&argc, &argv, (char *)0, NULL); > if (ierr) > return ierr; > > int i = 0; > for (i = 0; i < 1; i++) > { > Mat A; > ierr = MatCreate(PETSC_COMM_WORLD, &A); > CHKERRQ(ierr); > ierr = MatSetSizes(A, 16, 16, PETSC_DECIDE,PETSC_DECIDE); > CHKERRQ(ierr); > ierr = MatSetFromOptions(A); > CHKERRQ(ierr); > ierr = MatSetUp(A); > CHKERRQ(ierr); > > > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > CHKERRQ(ierr); > > > > /// SOME CODE HERE.... > > MatDestroy(&A); > > > } > > > FILE *fPtr; > fPtr = fopen("petsc_dump_file.txt", "a"); > PetscMallocDump(fPtr); > fclose(fPtr); > > ierr = PetscFinalize(); > CHKERRQ(ierr); > > return 0; > } > > > > The problem is , in the loop, the memory consumption keep increasing till the end of the program. > > I checked memory leak with PetscMallocDump, and found out that the problem may be due to matrix creation. > > I'am new to Petsc and i don't know if i'am doing something wrong. Thanks > > > M?dane > From medane.tchakorom at univ-fcomte.fr Fri Sep 24 09:31:51 2021 From: medane.tchakorom at univ-fcomte.fr (Medane TCHAKOROM) Date: Fri, 24 Sep 2021 16:31:51 +0200 Subject: [petsc-users] Petsc memory consumption keep increasing in my loop In-Reply-To: <6D5C33D0-A02E-443F-8A7C-9ADBB89369CB@petsc.dev> References: <48da1142-7ab9-e549-dc15-63a74dc093b2@univ-fcomte.fr> <6D5C33D0-A02E-443F-8A7C-9ADBB89369CB@petsc.dev> Message-ID: <5bb0c397-4558-b6d9-8dd7-19226cb0009c@univ-fcomte.fr> Thanks Barry, I can't share the orginal code i'am working on unfortunately. But the example i wrote -- even if you do not that into account //SOME CODE HERE .. part -- give me , by using PetscMallocDump, some informations about memory that was not freed. Based on the example code i sent, i was expecting that PetscMallocDump give no output. M?dane On 24/09/2021 16:13, Barry Smith wrote: > The code you sent looks fine, it should not leak memory. > > Perhaps the /// SOME CODE HERE.... is doing something that prevents the matrix from being actually freed. PETSc uses reference counting on its objects so if another object keeps a reference to the matrix then the memory of the matrix will not be freed until the reference count drops back to zero. For example if a KSP has a reference to the matrix and the KSP has not been completely freed the matrix memory will remain. > > We would need to see the full code to understand why the matrix is not being freed. > > Barry > > >> On Sep 24, 2021, at 10:08 AM, Medane TCHAKOROM wrote: >> >> Hello, >> >> I have problem with a code i'am working on. >> >> To illustrate my problem, here is an example: >> >> >> int main(int argc, char *argv[]) >> { >> >> PetscErrorCode ierr; >> >> ierr = PetscInitialize(&argc, &argv, (char *)0, NULL); >> if (ierr) >> return ierr; >> >> int i = 0; >> for (i = 0; i < 1; i++) >> { >> Mat A; >> ierr = MatCreate(PETSC_COMM_WORLD, &A); >> CHKERRQ(ierr); >> ierr = MatSetSizes(A, 16, 16, PETSC_DECIDE,PETSC_DECIDE); >> CHKERRQ(ierr); >> ierr = MatSetFromOptions(A); >> CHKERRQ(ierr); >> ierr = MatSetUp(A); >> CHKERRQ(ierr); >> >> >> ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); >> CHKERRQ(ierr); >> ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); >> CHKERRQ(ierr); >> >> >> >> /// SOME CODE HERE.... >> >> MatDestroy(&A); >> >> >> } >> >> >> FILE *fPtr; >> fPtr = fopen("petsc_dump_file.txt", "a"); >> PetscMallocDump(fPtr); >> fclose(fPtr); >> >> ierr = PetscFinalize(); >> CHKERRQ(ierr); >> >> return 0; >> } >> >> >> >> The problem is , in the loop, the memory consumption keep increasing till the end of the program. >> >> I checked memory leak with PetscMallocDump, and found out that the problem may be due to matrix creation. >> >> I'am new to Petsc and i don't know if i'am doing something wrong. Thanks >> >> >> M?dane >> From bsmith at petsc.dev Fri Sep 24 10:21:23 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 24 Sep 2021 11:21:23 -0400 Subject: [petsc-users] Petsc memory consumption keep increasing in my loop In-Reply-To: <5bb0c397-4558-b6d9-8dd7-19226cb0009c@univ-fcomte.fr> References: <48da1142-7ab9-e549-dc15-63a74dc093b2@univ-fcomte.fr> <6D5C33D0-A02E-443F-8A7C-9ADBB89369CB@petsc.dev> <5bb0c397-4558-b6d9-8dd7-19226cb0009c@univ-fcomte.fr> Message-ID: <3A611848-D4C2-475A-94B2-C8066CBE1C21@petsc.dev> Ahh, the stuff you are seeing is just memory associated with the initialization of the matrix package; it is not the matrix memory (that is all freed). This memory used in the initialization is used only once and will not grow with more matrices. If you run the program with -malloc_dump then you will see nothing is printed since the memory used in the initialization is freed in PetscFinalize(). Barry > On Sep 24, 2021, at 10:31 AM, Medane TCHAKOROM wrote: > > Thanks Barry, > > I can't share the orginal code i'am working on unfortunately. > > But the example i wrote -- even if you do not that into account //SOME CODE HERE .. part -- give me , by using PetscMallocDump, some informations about memory that was not freed. > > Based on the example code i sent, i was expecting that PetscMallocDump give no output. > > M?dane > > > On 24/09/2021 16:13, Barry Smith wrote: >> The code you sent looks fine, it should not leak memory. >> >> Perhaps the /// SOME CODE HERE.... is doing something that prevents the matrix from being actually freed. PETSc uses reference counting on its objects so if another object keeps a reference to the matrix then the memory of the matrix will not be freed until the reference count drops back to zero. For example if a KSP has a reference to the matrix and the KSP has not been completely freed the matrix memory will remain. >> >> We would need to see the full code to understand why the matrix is not being freed. >> >> Barry >> >> >>> On Sep 24, 2021, at 10:08 AM, Medane TCHAKOROM wrote: >>> >>> Hello, >>> >>> I have problem with a code i'am working on. >>> >>> To illustrate my problem, here is an example: >>> >>> >>> int main(int argc, char *argv[]) >>> { >>> >>> PetscErrorCode ierr; >>> >>> ierr = PetscInitialize(&argc, &argv, (char *)0, NULL); >>> if (ierr) >>> return ierr; >>> >>> int i = 0; >>> for (i = 0; i < 1; i++) >>> { >>> Mat A; >>> ierr = MatCreate(PETSC_COMM_WORLD, &A); >>> CHKERRQ(ierr); >>> ierr = MatSetSizes(A, 16, 16, PETSC_DECIDE,PETSC_DECIDE); >>> CHKERRQ(ierr); >>> ierr = MatSetFromOptions(A); >>> CHKERRQ(ierr); >>> ierr = MatSetUp(A); >>> CHKERRQ(ierr); >>> >>> >>> ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); >>> CHKERRQ(ierr); >>> ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); >>> CHKERRQ(ierr); >>> >>> >>> >>> /// SOME CODE HERE.... >>> >>> MatDestroy(&A); >>> >>> >>> } >>> >>> >>> FILE *fPtr; >>> fPtr = fopen("petsc_dump_file.txt", "a"); >>> PetscMallocDump(fPtr); >>> fclose(fPtr); >>> >>> ierr = PetscFinalize(); >>> CHKERRQ(ierr); >>> >>> return 0; >>> } >>> >>> >>> >>> The problem is , in the loop, the memory consumption keep increasing till the end of the program. >>> >>> I checked memory leak with PetscMallocDump, and found out that the problem may be due to matrix creation. >>> >>> I'am new to Petsc and i don't know if i'am doing something wrong. Thanks >>> >>> >>> M?dane >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From medane.tchakorom at univ-fcomte.fr Fri Sep 24 10:38:22 2021 From: medane.tchakorom at univ-fcomte.fr (Medane TCHAKOROM) Date: Fri, 24 Sep 2021 17:38:22 +0200 Subject: [petsc-users] Petsc memory consumption keep increasing in my loop In-Reply-To: <3A611848-D4C2-475A-94B2-C8066CBE1C21@petsc.dev> References: <48da1142-7ab9-e549-dc15-63a74dc093b2@univ-fcomte.fr> <6D5C33D0-A02E-443F-8A7C-9ADBB89369CB@petsc.dev> <5bb0c397-4558-b6d9-8dd7-19226cb0009c@univ-fcomte.fr> <3A611848-D4C2-475A-94B2-C8066CBE1C21@petsc.dev> Message-ID: <83ea52a4-6804-c0fe-79a3-9dc1c2245a64@univ-fcomte.fr> Thank you for the precision, now i can eliminate this case, and try to find where my bug is coming from. M?dane On 24/09/2021 17:21, Barry Smith wrote: > > ? Ahh, the stuff you are seeing is just memory associated with the > initialization of the matrix package; it is not the matrix memory > (that is all freed). This memory used in the initialization is used > only once and will not grow with more matrices. > > ? If you run the program with -malloc_dump?then you will see nothing > is printed since the memory used in the initialization is freed in > PetscFinalize(). > > ? Barry > > >> On Sep 24, 2021, at 10:31 AM, Medane TCHAKOROM >> > > wrote: >> >> Thanks Barry, >> >> I can't share the orginal code i'am working on unfortunately. >> >> But the example i wrote -- even if you do not that into account >> //SOME CODE HERE .. part -- give me , by using PetscMallocDump, some >> informations about memory that was not freed. >> >> Based on the example code i sent, i was expecting that >> PetscMallocDump give no output. >> >> M?dane >> >> >> On 24/09/2021 16:13, Barry Smith wrote: >>> ?The code you sent looks fine, it should not leak memory. >>> >>> ?Perhaps the /// SOME CODE HERE.... is doing something that prevents >>> the matrix from being actually freed. PETSc uses reference counting >>> on its objects so if another object keeps a reference to the matrix >>> then the memory of the matrix will not be freed until the reference >>> count drops back to zero. For example if a KSP has a reference to >>> the matrix and the KSP has not been completely freed the matrix >>> memory will remain. >>> >>> ??We would need to see the full code to understand why the matrix is >>> not being freed. >>> >>> ??Barry >>> >>> >>>> On Sep 24, 2021, at 10:08 AM, Medane TCHAKOROM >>>> >>> > wrote: >>>> >>>> Hello, >>>> >>>> I have problem with a code i'am working on. >>>> >>>> To illustrate my problem, here is an example: >>>> >>>> >>>> int main(int argc, char *argv[]) >>>> { >>>> >>>> ????PetscErrorCode ierr; >>>> >>>> ????ierr = PetscInitialize(&argc, &argv, (char *)0, NULL); >>>> ????if (ierr) >>>> ????????return ierr; >>>> >>>> ????int i = 0; >>>> ????for (i = 0; i < 1; i++) >>>> ????{ >>>> ????????Mat A; >>>> ????????ierr = MatCreate(PETSC_COMM_WORLD, &A); >>>> ????????CHKERRQ(ierr); >>>> ????????ierr = MatSetSizes(A, 16, 16, PETSC_DECIDE,PETSC_DECIDE); >>>> ????????CHKERRQ(ierr); >>>> ????????ierr = MatSetFromOptions(A); >>>> ????????CHKERRQ(ierr); >>>> ????????ierr = MatSetUp(A); >>>> ????????CHKERRQ(ierr); >>>> >>>> >>>> ????????ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); >>>> ????????CHKERRQ(ierr); >>>> ????????ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); >>>> ????????CHKERRQ(ierr); >>>> >>>> >>>> >>>> ????????/// SOME CODE HERE.... >>>> >>>> ????????MatDestroy(&A); >>>> >>>> >>>> ????} >>>> >>>> >>>> ????FILE *fPtr; >>>> ????fPtr = fopen("petsc_dump_file.txt", "a"); >>>> ????PetscMallocDump(fPtr); >>>> ????fclose(fPtr); >>>> >>>> ????ierr = PetscFinalize(); >>>> ????CHKERRQ(ierr); >>>> >>>> ????return 0; >>>> } >>>> >>>> >>>> >>>> The problem is , in the loop, the memory consumption keep >>>> increasing till the end of the program. >>>> >>>> I checked memory leak with PetscMallocDump, and found out that the >>>> problem may be due to matrix creation. >>>> >>>> I'am new to Petsc and i don't know if i'am doing something wrong. >>>> Thanks >>>> >>>> >>>> M?dane >>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Sep 24 11:14:18 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 24 Sep 2021 18:14:18 +0200 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> Message-ID: <4BB88AB3-410E-493C-9161-97775747936D@dsic.upv.es> If you do $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 then it is using an LU factorization (the default), which is fast. Use -eps_view to see which solver settings are you using. BiCGStab with block Jacobi does not work for you matrix, it exceeds the maximum 10000 iterations. So this is not viable unless you can find a better preconditioner for your problem. If not, just using EPS_SMALLEST_MAGNITUDE will be faster. Computing smallest magnitude eigenvalues is a difficult task. The most robust way is to compute a (parallel) LU factorization if you can afford it. A side note: don't add this to your source code #define PETSC_USE_COMPLEX 1 This define is taken from PETSc's include files, you should not mess with it. Instead, you probably want to add something like this AFTER #include : #if !defined(PETSC_USE_COMPLEX) #error "Requires complex scalars" #endif Jose > El 22 sept 2021, a las 19:38, Varun Hiremath escribi?: > > Hi Jose, > > Thank you, that explains it and my example code works now without specifying "-eps_target 0" in the command line. > > However, both the Krylov inexact shift-invert and JD solvers are struggling to converge for some of my actual problems. The issue seems to be related to non-symmetric general matrices. I have extracted one such matrix attached here as MatA.gz (size 100k), and have also included a short program that loads this matrix and then computes the smallest eigenvalues as I described earlier. > > For this matrix, if I compute the eigenvalues directly (without using the shell matrix) using shift-and-invert (as below) then it converges in less than a minute. > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > > However, if I use the shell matrix and use any of the preconditioned solvers JD or Krylov shift-invert (as shown below) with the same matrix as the preconditioner, then they struggle to converge. > $ ./acoustic_matrix_test.o -usejd 1 -deflate 1 > $ ./acoustic_matrix_test.o -sinvert 1 -deflate 1 > > Could you please check the attached code and suggest any changes in settings that might help with convergence for these kinds of matrices? I appreciate your help! > > Thanks, > Varun > > On Tue, Sep 21, 2021 at 11:14 AM Jose E. Roman wrote: > I will have a look at your code when I have more time. Meanwhile, I am answering 3) below... > > > El 21 sept 2021, a las 0:23, Varun Hiremath escribi?: > > > > Hi Jose, > > > > Sorry, it took me a while to test these settings in the new builds. I am getting good improvement in performance using the preconditioned solvers, so thanks for the suggestions! But I have some questions related to the usage. > > > > We are using SLEPc to solve the acoustic modal eigenvalue problem. Attached is a simple standalone program that computes acoustic modes in a simple rectangular box. This program illustrates the general setup I am using, though here the shell matrix and the preconditioner matrix are the same, while in my actual program the shell matrix computes A*x without explicitly forming A, and the preconditioner is a 0th order approximation of A. > > > > In the attached program I have tested both > > 1) the Krylov-Schur with inexact shift-and-invert (implemented under the option sinvert); > > 2) the JD solver with preconditioner (implemented under the option usejd) > > > > Both the solvers seem to work decently, compared to no preconditioning. This is how I run the two solvers (for a mesh size of 1600x400): > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 -eps_target 0 > > Both finish in about ~10 minutes on my system in serial. JD seems to be slightly faster and more accurate (for the imaginary part of eigenvalue). > > The program also runs in parallel using mpiexec. I use complex builds, as in my main program the matrix can be complex. > > > > Now here are my questions: > > 1) For this particular problem type, could you please check if these are the best settings that one could use? I have tried different combinations of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI seems to work the best in serial and parallel. > > > > 2) When I tested these settings in my main program, for some reason the JD solver was not converging. After further testing, I found the issue was related to the setting of "-eps_target 0". I have included "EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to passing "-eps_target 0" from the command line, but that doesn't seem to be the case. For instance, if I run the attached program without "-eps_target 0" in the command line then it doesn't converge. > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 > > the above finishes in about 10 minutes > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > > the above doesn't converge even though "EPSSetTarget(eps,0.0);" is included in the code > > > > This only seems to affect the JD solver, not the Krylov shift-and-invert (-sinvert 1) option. So is there any difference between passing "-eps_target 0" from the command line vs using "EPSSetTarget(eps,0.0);" in the code? I cannot pass any command line arguments in my actual program, so need to set everything internally. > > > > 3) Also, another minor related issue. While using the inexact shift-and-invert option, I was running into the following error: > > > > "" > > Missing or incorrect user input > > Shift-and-invert requires a target 'which' (see EPSSetWhichEigenpairs), for instance -st_type sinvert -eps_target 0 -eps_target_magnitude > > "" > > > > I already have the below two lines in the code: > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > EPSSetTarget(eps,0.0); > > > > so shouldn't these be enough? If I comment out the first line "EPSSetWhichEigenpairs", then the code works fine. > > You should either do > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > without shift-and-invert or > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > EPSSetTarget(eps,0.0); > > with shift-and-invert. The latter can also be used without shift-and-invert (e.g. in JD). > > I have to check, but a possible explanation why in your comment above (2) the command-line option -eps_target 0 works differently is that it also sets -eps_target_magnitude if omitted, so to be equivalent in source code you have to call both > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > EPSSetTarget(eps,0.0); > > Jose > > > I have some more questions regarding setting the preconditioner for a quadratic eigenvalue problem, which I will ask in a follow-up email. > > > > Thanks for your help! > > > > -Varun > > > > > > On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath wrote: > > Thank you very much for these suggestions! We are currently using version 3.12, so I'll try to update to the latest version and try your suggestions. Let me get back to you, thanks! > > > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman wrote: > > Then I would try Davidson methods https://doi.org/10.1145/2543696 > > You can also try Krylov-Schur with "inexact" shift-and-invert, for instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the users manual. > > > > In both cases, you have to pass matrix A in the call to EPSSetOperators() and the preconditioner matrix via STSetPreconditionerMat() - note this function was introduced in version 3.15. > > > > Jose > > > > > > > > > El 1 jul 2021, a las 13:36, Varun Hiremath escribi?: > > > > > > Thanks. I actually do have a 1st order approximation of matrix A, that I can explicitly compute and also invert. Can I use that matrix as preconditioner to speed things up? Is there some example that explains how to setup and call SLEPc for this scenario? > > > > > > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman wrote: > > > For smallest real parts one could adapt ex34.c, but it is going to be costly https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html > > > Also, if eigenvalues are clustered around the origin, convergence may still be very slow. > > > > > > It is a tough problem, unless you are able to compute a good preconditioner of A (no need to compute the exact inverse). > > > > > > Jose > > > > > > > > > > El 1 jul 2021, a las 13:23, Varun Hiremath escribi?: > > > > > > > > I'm solving for the smallest eigenvalues in magnitude. Though is it cheaper to solve smallest in real part, as that might also work in my case? Thanks for your help. > > > > > > > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman wrote: > > > > Smallest eigenvalue in magnitude or real part? > > > > > > > > > > > > > El 1 jul 2021, a las 11:58, Varun Hiremath escribi?: > > > > > > > > > > Sorry, no both A and B are general sparse matrices (non-hermitian). So is there anything else I could try? > > > > > > > > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman wrote: > > > > > Is the problem symmetric (GHEP)? In that case, you can try LOBPCG on the pair (A,B). But this will likely be slow as well, unless you can provide a good preconditioner. > > > > > > > > > > Jose > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:37, Varun Hiremath escribi?: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > I am trying to compute the smallest eigenvalues of a generalized system A*x= lambda*B*x. I don't explicitly know the matrix A (so I am using a shell matrix with a custom matmult function) however, the matrix B is explicitly known so I compute inv(B)*A within the shell matrix and solve inv(B)*A*x = lambda*x. > > > > > > > > > > > > To compute the smallest eigenvalues it is recommended to solve the inverted system, but since matrix A is not explicitly known I can't invert the system. Moreover, the size of the system can be really big, and with the default Krylov solver, it is extremely slow. So is there a better way for me to compute the smallest eigenvalues of this system? > > > > > > > > > > > > Thanks, > > > > > > Varun > > > > > > > > > > > > > > > > > > From varunhiremath at gmail.com Sat Sep 25 01:07:55 2021 From: varunhiremath at gmail.com (Varun Hiremath) Date: Fri, 24 Sep 2021 23:07:55 -0700 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: <4BB88AB3-410E-493C-9161-97775747936D@dsic.upv.es> References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> <4BB88AB3-410E-493C-9161-97775747936D@dsic.upv.es> Message-ID: Hi Jose, Thanks for checking my code and providing suggestions. In my particular case, I don't know the matrix A explicitly, I compute A*x in a matrix-free way within a shell matrix, so I can't use any of the direct factorization methods. But just a question regarding your suggestion to compute a (parallel) LU factorization. In our work, we do use MUMPS to compute the parallel factorization. For solving the generalized problem, A*x = lambda*B*x, we are computing inv(B)*A*x within a shell matrix, where factorization of B is computed using MUMPS. (We don't call MUMPS through SLEPc as we have our own MPI wrapper and other user settings to handle.) So for the preconditioning, instead of using the iterative solvers, can I provide a shell matrix that computes inv(P)*x corrections (where P is the preconditioner matrix) using MUMPS direct solver? And yes, thanks, #define PETSC_USE_COMPLEX 1 is not needed, it works without it. Regards, Varun On Fri, Sep 24, 2021 at 9:14 AM Jose E. Roman wrote: > If you do > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > then it is using an LU factorization (the default), which is fast. > > Use -eps_view to see which solver settings are you using. > > BiCGStab with block Jacobi does not work for you matrix, it exceeds the > maximum 10000 iterations. So this is not viable unless you can find a > better preconditioner for your problem. If not, just using > EPS_SMALLEST_MAGNITUDE will be faster. > > Computing smallest magnitude eigenvalues is a difficult task. The most > robust way is to compute a (parallel) LU factorization if you can afford it. > > > A side note: don't add this to your source code > #define PETSC_USE_COMPLEX 1 > This define is taken from PETSc's include files, you should not mess with > it. Instead, you probably want to add something like this AFTER #include > : > #if !defined(PETSC_USE_COMPLEX) > #error "Requires complex scalars" > #endif > > Jose > > > > El 22 sept 2021, a las 19:38, Varun Hiremath > escribi?: > > > > Hi Jose, > > > > Thank you, that explains it and my example code works now without > specifying "-eps_target 0" in the command line. > > > > However, both the Krylov inexact shift-invert and JD solvers are > struggling to converge for some of my actual problems. The issue seems to > be related to non-symmetric general matrices. I have extracted one such > matrix attached here as MatA.gz (size 100k), and have also included a short > program that loads this matrix and then computes the smallest eigenvalues > as I described earlier. > > > > For this matrix, if I compute the eigenvalues directly (without using > the shell matrix) using shift-and-invert (as below) then it converges in > less than a minute. > > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > > > > However, if I use the shell matrix and use any of the preconditioned > solvers JD or Krylov shift-invert (as shown below) with the same matrix as > the preconditioner, then they struggle to converge. > > $ ./acoustic_matrix_test.o -usejd 1 -deflate 1 > > $ ./acoustic_matrix_test.o -sinvert 1 -deflate 1 > > > > Could you please check the attached code and suggest any changes in > settings that might help with convergence for these kinds of matrices? I > appreciate your help! > > > > Thanks, > > Varun > > > > On Tue, Sep 21, 2021 at 11:14 AM Jose E. Roman > wrote: > > I will have a look at your code when I have more time. Meanwhile, I am > answering 3) below... > > > > > El 21 sept 2021, a las 0:23, Varun Hiremath > escribi?: > > > > > > Hi Jose, > > > > > > Sorry, it took me a while to test these settings in the new builds. I > am getting good improvement in performance using the preconditioned > solvers, so thanks for the suggestions! But I have some questions related > to the usage. > > > > > > We are using SLEPc to solve the acoustic modal eigenvalue problem. > Attached is a simple standalone program that computes acoustic modes in a > simple rectangular box. This program illustrates the general setup I am > using, though here the shell matrix and the preconditioner matrix are the > same, while in my actual program the shell matrix computes A*x without > explicitly forming A, and the preconditioner is a 0th order approximation > of A. > > > > > > In the attached program I have tested both > > > 1) the Krylov-Schur with inexact shift-and-invert (implemented under > the option sinvert); > > > 2) the JD solver with preconditioner (implemented under the option > usejd) > > > > > > Both the solvers seem to work decently, compared to no > preconditioning. This is how I run the two solvers (for a mesh size of > 1600x400): > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > -eps_target 0 > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 > -eps_target 0 > > > Both finish in about ~10 minutes on my system in serial. JD seems to > be slightly faster and more accurate (for the imaginary part of eigenvalue). > > > The program also runs in parallel using mpiexec. I use complex builds, > as in my main program the matrix can be complex. > > > > > > Now here are my questions: > > > 1) For this particular problem type, could you please check if these > are the best settings that one could use? I have tried different > combinations of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI > seems to work the best in serial and parallel. > > > > > > 2) When I tested these settings in my main program, for some reason > the JD solver was not converging. After further testing, I found the issue > was related to the setting of "-eps_target 0". I have included > "EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to > passing "-eps_target 0" from the command line, but that doesn't seem to be > the case. For instance, if I run the attached program without "-eps_target > 0" in the command line then it doesn't converge. > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > -eps_target 0 > > > the above finishes in about 10 minutes > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > > > the above doesn't converge even though "EPSSetTarget(eps,0.0);" is > included in the code > > > > > > This only seems to affect the JD solver, not the Krylov > shift-and-invert (-sinvert 1) option. So is there any difference between > passing "-eps_target 0" from the command line vs using > "EPSSetTarget(eps,0.0);" in the code? I cannot pass any command line > arguments in my actual program, so need to set everything internally. > > > > > > 3) Also, another minor related issue. While using the inexact > shift-and-invert option, I was running into the following error: > > > > > > "" > > > Missing or incorrect user input > > > Shift-and-invert requires a target 'which' (see > EPSSetWhichEigenpairs), for instance -st_type sinvert -eps_target 0 > -eps_target_magnitude > > > "" > > > > > > I already have the below two lines in the code: > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > > EPSSetTarget(eps,0.0); > > > > > > so shouldn't these be enough? If I comment out the first line > "EPSSetWhichEigenpairs", then the code works fine. > > > > You should either do > > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > > > without shift-and-invert or > > > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > > EPSSetTarget(eps,0.0); > > > > with shift-and-invert. The latter can also be used without > shift-and-invert (e.g. in JD). > > > > I have to check, but a possible explanation why in your comment above > (2) the command-line option -eps_target 0 works differently is that it also > sets -eps_target_magnitude if omitted, so to be equivalent in source code > you have to call both > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > > EPSSetTarget(eps,0.0); > > > > Jose > > > > > I have some more questions regarding setting the preconditioner for a > quadratic eigenvalue problem, which I will ask in a follow-up email. > > > > > > Thanks for your help! > > > > > > -Varun > > > > > > > > > On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath > wrote: > > > Thank you very much for these suggestions! We are currently using > version 3.12, so I'll try to update to the latest version and try your > suggestions. Let me get back to you, thanks! > > > > > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman wrote: > > > Then I would try Davidson methods https://doi.org/10.1145/2543696 > > > You can also try Krylov-Schur with "inexact" shift-and-invert, for > instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the > users manual. > > > > > > In both cases, you have to pass matrix A in the call to > EPSSetOperators() and the preconditioner matrix via > STSetPreconditionerMat() - note this function was introduced in version > 3.15. > > > > > > Jose > > > > > > > > > > > > > El 1 jul 2021, a las 13:36, Varun Hiremath > escribi?: > > > > > > > > Thanks. I actually do have a 1st order approximation of matrix A, > that I can explicitly compute and also invert. Can I use that matrix as > preconditioner to speed things up? Is there some example that explains how > to setup and call SLEPc for this scenario? > > > > > > > > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman > wrote: > > > > For smallest real parts one could adapt ex34.c, but it is going to > be costly > https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html > > > > Also, if eigenvalues are clustered around the origin, convergence > may still be very slow. > > > > > > > > It is a tough problem, unless you are able to compute a good > preconditioner of A (no need to compute the exact inverse). > > > > > > > > Jose > > > > > > > > > > > > > El 1 jul 2021, a las 13:23, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > I'm solving for the smallest eigenvalues in magnitude. Though is > it cheaper to solve smallest in real part, as that might also work in my > case? Thanks for your help. > > > > > > > > > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman > wrote: > > > > > Smallest eigenvalue in magnitude or real part? > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:58, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > > > Sorry, no both A and B are general sparse matrices > (non-hermitian). So is there anything else I could try? > > > > > > > > > > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman > wrote: > > > > > > Is the problem symmetric (GHEP)? In that case, you can try > LOBPCG on the pair (A,B). But this will likely be slow as well, unless you > can provide a good preconditioner. > > > > > > > > > > > > Jose > > > > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:37, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > I am trying to compute the smallest eigenvalues of a > generalized system A*x= lambda*B*x. I don't explicitly know the matrix A > (so I am using a shell matrix with a custom matmult function) however, the > matrix B is explicitly known so I compute inv(B)*A within the shell matrix > and solve inv(B)*A*x = lambda*x. > > > > > > > > > > > > > > To compute the smallest eigenvalues it is recommended to solve > the inverted system, but since matrix A is not explicitly known I can't > invert the system. Moreover, the size of the system can be really big, and > with the default Krylov solver, it is extremely slow. So is there a better > way for me to compute the smallest eigenvalues of this system? > > > > > > > > > > > > > > Thanks, > > > > > > > Varun > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sat Sep 25 01:12:16 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sat, 25 Sep 2021 08:12:16 +0200 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> <4BB88AB3-410E-493C-9161-97775747936D@dsic.upv.es> Message-ID: <32B34038-7E1A-42CA-A55D-9AF9D41D1697@dsic.upv.es> Yes, you can use PCMAT https://petsc.org/release/docs/manualpages/PC/PCMAT.html then pass a preconditioner matrix that performs the inverse via a shell matrix. > El 25 sept 2021, a las 8:07, Varun Hiremath escribi?: > > Hi Jose, > > Thanks for checking my code and providing suggestions. > > In my particular case, I don't know the matrix A explicitly, I compute A*x in a matrix-free way within a shell matrix, so I can't use any of the direct factorization methods. But just a question regarding your suggestion to compute a (parallel) LU factorization. In our work, we do use MUMPS to compute the parallel factorization. For solving the generalized problem, A*x = lambda*B*x, we are computing inv(B)*A*x within a shell matrix, where factorization of B is computed using MUMPS. (We don't call MUMPS through SLEPc as we have our own MPI wrapper and other user settings to handle.) > > So for the preconditioning, instead of using the iterative solvers, can I provide a shell matrix that computes inv(P)*x corrections (where P is the preconditioner matrix) using MUMPS direct solver? > > And yes, thanks, #define PETSC_USE_COMPLEX 1 is not needed, it works without it. > > Regards, > Varun > > On Fri, Sep 24, 2021 at 9:14 AM Jose E. Roman wrote: > If you do > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > then it is using an LU factorization (the default), which is fast. > > Use -eps_view to see which solver settings are you using. > > BiCGStab with block Jacobi does not work for you matrix, it exceeds the maximum 10000 iterations. So this is not viable unless you can find a better preconditioner for your problem. If not, just using EPS_SMALLEST_MAGNITUDE will be faster. > > Computing smallest magnitude eigenvalues is a difficult task. The most robust way is to compute a (parallel) LU factorization if you can afford it. > > > A side note: don't add this to your source code > #define PETSC_USE_COMPLEX 1 > This define is taken from PETSc's include files, you should not mess with it. Instead, you probably want to add something like this AFTER #include : > #if !defined(PETSC_USE_COMPLEX) > #error "Requires complex scalars" > #endif > > Jose > > > > El 22 sept 2021, a las 19:38, Varun Hiremath escribi?: > > > > Hi Jose, > > > > Thank you, that explains it and my example code works now without specifying "-eps_target 0" in the command line. > > > > However, both the Krylov inexact shift-invert and JD solvers are struggling to converge for some of my actual problems. The issue seems to be related to non-symmetric general matrices. I have extracted one such matrix attached here as MatA.gz (size 100k), and have also included a short program that loads this matrix and then computes the smallest eigenvalues as I described earlier. > > > > For this matrix, if I compute the eigenvalues directly (without using the shell matrix) using shift-and-invert (as below) then it converges in less than a minute. > > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > > > > However, if I use the shell matrix and use any of the preconditioned solvers JD or Krylov shift-invert (as shown below) with the same matrix as the preconditioner, then they struggle to converge. > > $ ./acoustic_matrix_test.o -usejd 1 -deflate 1 > > $ ./acoustic_matrix_test.o -sinvert 1 -deflate 1 > > > > Could you please check the attached code and suggest any changes in settings that might help with convergence for these kinds of matrices? I appreciate your help! > > > > Thanks, > > Varun > > > > On Tue, Sep 21, 2021 at 11:14 AM Jose E. Roman wrote: > > I will have a look at your code when I have more time. Meanwhile, I am answering 3) below... > > > > > El 21 sept 2021, a las 0:23, Varun Hiremath escribi?: > > > > > > Hi Jose, > > > > > > Sorry, it took me a while to test these settings in the new builds. I am getting good improvement in performance using the preconditioned solvers, so thanks for the suggestions! But I have some questions related to the usage. > > > > > > We are using SLEPc to solve the acoustic modal eigenvalue problem. Attached is a simple standalone program that computes acoustic modes in a simple rectangular box. This program illustrates the general setup I am using, though here the shell matrix and the preconditioner matrix are the same, while in my actual program the shell matrix computes A*x without explicitly forming A, and the preconditioner is a 0th order approximation of A. > > > > > > In the attached program I have tested both > > > 1) the Krylov-Schur with inexact shift-and-invert (implemented under the option sinvert); > > > 2) the JD solver with preconditioner (implemented under the option usejd) > > > > > > Both the solvers seem to work decently, compared to no preconditioning. This is how I run the two solvers (for a mesh size of 1600x400): > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 -eps_target 0 > > > Both finish in about ~10 minutes on my system in serial. JD seems to be slightly faster and more accurate (for the imaginary part of eigenvalue). > > > The program also runs in parallel using mpiexec. I use complex builds, as in my main program the matrix can be complex. > > > > > > Now here are my questions: > > > 1) For this particular problem type, could you please check if these are the best settings that one could use? I have tried different combinations of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI seems to work the best in serial and parallel. > > > > > > 2) When I tested these settings in my main program, for some reason the JD solver was not converging. After further testing, I found the issue was related to the setting of "-eps_target 0". I have included "EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to passing "-eps_target 0" from the command line, but that doesn't seem to be the case. For instance, if I run the attached program without "-eps_target 0" in the command line then it doesn't converge. > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 > > > the above finishes in about 10 minutes > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > > > the above doesn't converge even though "EPSSetTarget(eps,0.0);" is included in the code > > > > > > This only seems to affect the JD solver, not the Krylov shift-and-invert (-sinvert 1) option. So is there any difference between passing "-eps_target 0" from the command line vs using "EPSSetTarget(eps,0.0);" in the code? I cannot pass any command line arguments in my actual program, so need to set everything internally. > > > > > > 3) Also, another minor related issue. While using the inexact shift-and-invert option, I was running into the following error: > > > > > > "" > > > Missing or incorrect user input > > > Shift-and-invert requires a target 'which' (see EPSSetWhichEigenpairs), for instance -st_type sinvert -eps_target 0 -eps_target_magnitude > > > "" > > > > > > I already have the below two lines in the code: > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > > EPSSetTarget(eps,0.0); > > > > > > so shouldn't these be enough? If I comment out the first line "EPSSetWhichEigenpairs", then the code works fine. > > > > You should either do > > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > > > without shift-and-invert or > > > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > > EPSSetTarget(eps,0.0); > > > > with shift-and-invert. The latter can also be used without shift-and-invert (e.g. in JD). > > > > I have to check, but a possible explanation why in your comment above (2) the command-line option -eps_target 0 works differently is that it also sets -eps_target_magnitude if omitted, so to be equivalent in source code you have to call both > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > > EPSSetTarget(eps,0.0); > > > > Jose > > > > > I have some more questions regarding setting the preconditioner for a quadratic eigenvalue problem, which I will ask in a follow-up email. > > > > > > Thanks for your help! > > > > > > -Varun > > > > > > > > > On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath wrote: > > > Thank you very much for these suggestions! We are currently using version 3.12, so I'll try to update to the latest version and try your suggestions. Let me get back to you, thanks! > > > > > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman wrote: > > > Then I would try Davidson methods https://doi.org/10.1145/2543696 > > > You can also try Krylov-Schur with "inexact" shift-and-invert, for instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the users manual. > > > > > > In both cases, you have to pass matrix A in the call to EPSSetOperators() and the preconditioner matrix via STSetPreconditionerMat() - note this function was introduced in version 3.15. > > > > > > Jose > > > > > > > > > > > > > El 1 jul 2021, a las 13:36, Varun Hiremath escribi?: > > > > > > > > Thanks. I actually do have a 1st order approximation of matrix A, that I can explicitly compute and also invert. Can I use that matrix as preconditioner to speed things up? Is there some example that explains how to setup and call SLEPc for this scenario? > > > > > > > > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman wrote: > > > > For smallest real parts one could adapt ex34.c, but it is going to be costly https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html > > > > Also, if eigenvalues are clustered around the origin, convergence may still be very slow. > > > > > > > > It is a tough problem, unless you are able to compute a good preconditioner of A (no need to compute the exact inverse). > > > > > > > > Jose > > > > > > > > > > > > > El 1 jul 2021, a las 13:23, Varun Hiremath escribi?: > > > > > > > > > > I'm solving for the smallest eigenvalues in magnitude. Though is it cheaper to solve smallest in real part, as that might also work in my case? Thanks for your help. > > > > > > > > > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman wrote: > > > > > Smallest eigenvalue in magnitude or real part? > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:58, Varun Hiremath escribi?: > > > > > > > > > > > > Sorry, no both A and B are general sparse matrices (non-hermitian). So is there anything else I could try? > > > > > > > > > > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman wrote: > > > > > > Is the problem symmetric (GHEP)? In that case, you can try LOBPCG on the pair (A,B). But this will likely be slow as well, unless you can provide a good preconditioner. > > > > > > > > > > > > Jose > > > > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:37, Varun Hiremath escribi?: > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > I am trying to compute the smallest eigenvalues of a generalized system A*x= lambda*B*x. I don't explicitly know the matrix A (so I am using a shell matrix with a custom matmult function) however, the matrix B is explicitly known so I compute inv(B)*A within the shell matrix and solve inv(B)*A*x = lambda*x. > > > > > > > > > > > > > > To compute the smallest eigenvalues it is recommended to solve the inverted system, but since matrix A is not explicitly known I can't invert the system. Moreover, the size of the system can be really big, and with the default Krylov solver, it is extremely slow. So is there a better way for me to compute the smallest eigenvalues of this system? > > > > > > > > > > > > > > Thanks, > > > > > > > Varun > > > > > > > > > > > > > > > > > > > > > > > > > > From varunhiremath at gmail.com Sat Sep 25 01:50:38 2021 From: varunhiremath at gmail.com (Varun Hiremath) Date: Fri, 24 Sep 2021 23:50:38 -0700 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: <32B34038-7E1A-42CA-A55D-9AF9D41D1697@dsic.upv.es> References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> <4BB88AB3-410E-493C-9161-97775747936D@dsic.upv.es> <32B34038-7E1A-42CA-A55D-9AF9D41D1697@dsic.upv.es> Message-ID: Ok, great! I will give that a try, thanks for your help! On Fri, Sep 24, 2021 at 11:12 PM Jose E. Roman wrote: > Yes, you can use PCMAT > https://petsc.org/release/docs/manualpages/PC/PCMAT.html then pass a > preconditioner matrix that performs the inverse via a shell matrix. > > > El 25 sept 2021, a las 8:07, Varun Hiremath > escribi?: > > > > Hi Jose, > > > > Thanks for checking my code and providing suggestions. > > > > In my particular case, I don't know the matrix A explicitly, I compute > A*x in a matrix-free way within a shell matrix, so I can't use any of the > direct factorization methods. But just a question regarding your suggestion > to compute a (parallel) LU factorization. In our work, we do use MUMPS to > compute the parallel factorization. For solving the generalized problem, > A*x = lambda*B*x, we are computing inv(B)*A*x within a shell matrix, where > factorization of B is computed using MUMPS. (We don't call MUMPS through > SLEPc as we have our own MPI wrapper and other user settings to handle.) > > > > So for the preconditioning, instead of using the iterative solvers, can > I provide a shell matrix that computes inv(P)*x corrections (where P is the > preconditioner matrix) using MUMPS direct solver? > > > > And yes, thanks, #define PETSC_USE_COMPLEX 1 is not needed, it works > without it. > > > > Regards, > > Varun > > > > On Fri, Sep 24, 2021 at 9:14 AM Jose E. Roman > wrote: > > If you do > > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > > then it is using an LU factorization (the default), which is fast. > > > > Use -eps_view to see which solver settings are you using. > > > > BiCGStab with block Jacobi does not work for you matrix, it exceeds the > maximum 10000 iterations. So this is not viable unless you can find a > better preconditioner for your problem. If not, just using > EPS_SMALLEST_MAGNITUDE will be faster. > > > > Computing smallest magnitude eigenvalues is a difficult task. The most > robust way is to compute a (parallel) LU factorization if you can afford it. > > > > > > A side note: don't add this to your source code > > #define PETSC_USE_COMPLEX 1 > > This define is taken from PETSc's include files, you should not mess > with it. Instead, you probably want to add something like this AFTER > #include : > > #if !defined(PETSC_USE_COMPLEX) > > #error "Requires complex scalars" > > #endif > > > > Jose > > > > > > > El 22 sept 2021, a las 19:38, Varun Hiremath > escribi?: > > > > > > Hi Jose, > > > > > > Thank you, that explains it and my example code works now without > specifying "-eps_target 0" in the command line. > > > > > > However, both the Krylov inexact shift-invert and JD solvers are > struggling to converge for some of my actual problems. The issue seems to > be related to non-symmetric general matrices. I have extracted one such > matrix attached here as MatA.gz (size 100k), and have also included a short > program that loads this matrix and then computes the smallest eigenvalues > as I described earlier. > > > > > > For this matrix, if I compute the eigenvalues directly (without using > the shell matrix) using shift-and-invert (as below) then it converges in > less than a minute. > > > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > > > > > > However, if I use the shell matrix and use any of the preconditioned > solvers JD or Krylov shift-invert (as shown below) with the same matrix as > the preconditioner, then they struggle to converge. > > > $ ./acoustic_matrix_test.o -usejd 1 -deflate 1 > > > $ ./acoustic_matrix_test.o -sinvert 1 -deflate 1 > > > > > > Could you please check the attached code and suggest any changes in > settings that might help with convergence for these kinds of matrices? I > appreciate your help! > > > > > > Thanks, > > > Varun > > > > > > On Tue, Sep 21, 2021 at 11:14 AM Jose E. Roman > wrote: > > > I will have a look at your code when I have more time. Meanwhile, I am > answering 3) below... > > > > > > > El 21 sept 2021, a las 0:23, Varun Hiremath > escribi?: > > > > > > > > Hi Jose, > > > > > > > > Sorry, it took me a while to test these settings in the new builds. > I am getting good improvement in performance using the preconditioned > solvers, so thanks for the suggestions! But I have some questions related > to the usage. > > > > > > > > We are using SLEPc to solve the acoustic modal eigenvalue problem. > Attached is a simple standalone program that computes acoustic modes in a > simple rectangular box. This program illustrates the general setup I am > using, though here the shell matrix and the preconditioner matrix are the > same, while in my actual program the shell matrix computes A*x without > explicitly forming A, and the preconditioner is a 0th order approximation > of A. > > > > > > > > In the attached program I have tested both > > > > 1) the Krylov-Schur with inexact shift-and-invert (implemented under > the option sinvert); > > > > 2) the JD solver with preconditioner (implemented under the option > usejd) > > > > > > > > Both the solvers seem to work decently, compared to no > preconditioning. This is how I run the two solvers (for a mesh size of > 1600x400): > > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > -eps_target 0 > > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 > -eps_target 0 > > > > Both finish in about ~10 minutes on my system in serial. JD seems to > be slightly faster and more accurate (for the imaginary part of eigenvalue). > > > > The program also runs in parallel using mpiexec. I use complex > builds, as in my main program the matrix can be complex. > > > > > > > > Now here are my questions: > > > > 1) For this particular problem type, could you please check if these > are the best settings that one could use? I have tried different > combinations of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI > seems to work the best in serial and parallel. > > > > > > > > 2) When I tested these settings in my main program, for some reason > the JD solver was not converging. After further testing, I found the issue > was related to the setting of "-eps_target 0". I have included > "EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to > passing "-eps_target 0" from the command line, but that doesn't seem to be > the case. For instance, if I run the attached program without "-eps_target > 0" in the command line then it doesn't converge. > > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > -eps_target 0 > > > > the above finishes in about 10 minutes > > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > > > > the above doesn't converge even though "EPSSetTarget(eps,0.0);" is > included in the code > > > > > > > > This only seems to affect the JD solver, not the Krylov > shift-and-invert (-sinvert 1) option. So is there any difference between > passing "-eps_target 0" from the command line vs using > "EPSSetTarget(eps,0.0);" in the code? I cannot pass any command line > arguments in my actual program, so need to set everything internally. > > > > > > > > 3) Also, another minor related issue. While using the inexact > shift-and-invert option, I was running into the following error: > > > > > > > > "" > > > > Missing or incorrect user input > > > > Shift-and-invert requires a target 'which' (see > EPSSetWhichEigenpairs), for instance -st_type sinvert -eps_target 0 > -eps_target_magnitude > > > > "" > > > > > > > > I already have the below two lines in the code: > > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > > > EPSSetTarget(eps,0.0); > > > > > > > > so shouldn't these be enough? If I comment out the first line > "EPSSetWhichEigenpairs", then the code works fine. > > > > > > You should either do > > > > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > > > > > without shift-and-invert or > > > > > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > > > EPSSetTarget(eps,0.0); > > > > > > with shift-and-invert. The latter can also be used without > shift-and-invert (e.g. in JD). > > > > > > I have to check, but a possible explanation why in your comment above > (2) the command-line option -eps_target 0 works differently is that it also > sets -eps_target_magnitude if omitted, so to be equivalent in source code > you have to call both > > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > > > EPSSetTarget(eps,0.0); > > > > > > Jose > > > > > > > I have some more questions regarding setting the preconditioner for > a quadratic eigenvalue problem, which I will ask in a follow-up email. > > > > > > > > Thanks for your help! > > > > > > > > -Varun > > > > > > > > > > > > On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath < > varunhiremath at gmail.com> wrote: > > > > Thank you very much for these suggestions! We are currently using > version 3.12, so I'll try to update to the latest version and try your > suggestions. Let me get back to you, thanks! > > > > > > > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman > wrote: > > > > Then I would try Davidson methods https://doi.org/10.1145/2543696 > > > > You can also try Krylov-Schur with "inexact" shift-and-invert, for > instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the > users manual. > > > > > > > > In both cases, you have to pass matrix A in the call to > EPSSetOperators() and the preconditioner matrix via > STSetPreconditionerMat() - note this function was introduced in version > 3.15. > > > > > > > > Jose > > > > > > > > > > > > > > > > > El 1 jul 2021, a las 13:36, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > Thanks. I actually do have a 1st order approximation of matrix A, > that I can explicitly compute and also invert. Can I use that matrix as > preconditioner to speed things up? Is there some example that explains how > to setup and call SLEPc for this scenario? > > > > > > > > > > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman > wrote: > > > > > For smallest real parts one could adapt ex34.c, but it is going to > be costly > https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html > > > > > Also, if eigenvalues are clustered around the origin, convergence > may still be very slow. > > > > > > > > > > It is a tough problem, unless you are able to compute a good > preconditioner of A (no need to compute the exact inverse). > > > > > > > > > > Jose > > > > > > > > > > > > > > > > El 1 jul 2021, a las 13:23, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > > > I'm solving for the smallest eigenvalues in magnitude. Though is > it cheaper to solve smallest in real part, as that might also work in my > case? Thanks for your help. > > > > > > > > > > > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman > wrote: > > > > > > Smallest eigenvalue in magnitude or real part? > > > > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:58, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > > > > > Sorry, no both A and B are general sparse matrices > (non-hermitian). So is there anything else I could try? > > > > > > > > > > > > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman < > jroman at dsic.upv.es> wrote: > > > > > > > Is the problem symmetric (GHEP)? In that case, you can try > LOBPCG on the pair (A,B). But this will likely be slow as well, unless you > can provide a good preconditioner. > > > > > > > > > > > > > > Jose > > > > > > > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:37, Varun Hiremath < > varunhiremath at gmail.com> escribi?: > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > I am trying to compute the smallest eigenvalues of a > generalized system A*x= lambda*B*x. I don't explicitly know the matrix A > (so I am using a shell matrix with a custom matmult function) however, the > matrix B is explicitly known so I compute inv(B)*A within the shell matrix > and solve inv(B)*A*x = lambda*x. > > > > > > > > > > > > > > > > To compute the smallest eigenvalues it is recommended to > solve the inverted system, but since matrix A is not explicitly known I > can't invert the system. Moreover, the size of the system can be really > big, and with the default Krylov solver, it is extremely slow. So is there > a better way for me to compute the smallest eigenvalues of this system? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Varun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan.wukie at us.af.mil Mon Sep 27 10:59:03 2021 From: nathan.wukie at us.af.mil (WUKIE, NATHAN A DR-02 USAF AFMC AFRL/RQVC) Date: Mon, 27 Sep 2021 15:59:03 +0000 Subject: [petsc-users] Interaction between petsc4py and application Fortran library Message-ID: How should petsc initialization be handled for a python application utilizing petsc4py and a Fortran library application also using petsc? The petsc documentation states that PetscInitializeFortran "should be called soon AFTER the call to PetscInitialize() if one is using a C main program that calls Fortran routines that in turn call PETSc routines". Does petsc4py.init(...) call PetscInitializeFortran? Is it permissible to call PetscInitializeFortran from the fortran library application itself? Or must PetscInitializeFortran be called from the C main program? Thank you, Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Sep 27 12:54:46 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 27 Sep 2021 13:54:46 -0400 Subject: [petsc-users] Interaction between petsc4py and application Fortran library In-Reply-To: References: Message-ID: <8F63EBC4-7FF8-4388-92E5-B618A3E288D1@petsc.dev> Nathan, Yes, you can call PetscInitializeFortran() from your Fortran library. Barry > On Sep 27, 2021, at 11:59 AM, WUKIE, NATHAN A DR-02 USAF AFMC AFRL/RQVC via petsc-users wrote: > > How should petsc initialization be handled for a python application utilizing petsc4py and a Fortran library application also using petsc? > > The petsc documentation states that PetscInitializeFortran "should be called soon AFTER the call to PetscInitialize () if one is using a C main program that calls Fortran routines that in turn call PETSc routines". Does petsc4py.init(...) call PetscInitializeFortran? Is it permissible to call PetscInitializeFortran from the fortran library application itself? Or must PetscInitializeFortran be called from the C main program? > > Thank you, > Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From liyiyang30 at gmail.com Mon Sep 27 20:22:11 2021 From: liyiyang30 at gmail.com (Yiyang Li) Date: Mon, 27 Sep 2021 18:22:11 -0700 Subject: [petsc-users] Turn off CUDA Devices information Message-ID: Hello, I have CUDA aware MPI, and I have upgraded from PETSc 3.12 to PETSc 3.15.4 and petsc4py 3.15.4. Now, when I call PETSc.KSP().solve(..., ...) The information of GPU is always printed to stdout by every MPI rank, like CUDA version: v 11040 CUDA Devices: 0 : Quadro P4000 6 1 Global memory: 8105 mb Shared memory: 48 kb Constant memory: 64 kb Block registers: 65536 CUDA version: v 11040 CUDA Devices: 0 : Quadro P4000 6 1 Global memory: 8105 mb Shared memory: 48 kb Constant memory: 64 kb Block registers: 6553 ... I wonder if there is an option to turn that off? I have tried including -cuda_device NONE in command options, but that did not work. Best regards, Yiyang -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Sep 27 20:43:07 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 27 Sep 2021 20:43:07 -0500 (CDT) Subject: [petsc-users] Turn off CUDA Devices information In-Reply-To: References: Message-ID: <5e127072-8cc3-b41a-5e9-9e498cde85fb@mcs.anl.gov> Do you have petsc built with superlu_dist? Satish On Mon, 27 Sep 2021, Yiyang Li wrote: > Hello, > > I have CUDA aware MPI, and I have upgraded from PETSc 3.12 to PETSc 3.15.4 > and petsc4py 3.15.4. > > Now, when I call > > PETSc.KSP().solve(..., ...) > > The information of GPU is always printed to stdout by every MPI rank, like > > CUDA version: v 11040 > CUDA Devices: > > 0 : Quadro P4000 6 1 > Global memory: 8105 mb > Shared memory: 48 kb > Constant memory: 64 kb > Block registers: 65536 > > CUDA version: v 11040 > CUDA Devices: > > 0 : Quadro P4000 6 1 > Global memory: 8105 mb > Shared memory: 48 kb > Constant memory: 64 kb > Block registers: 6553 > > ... > > I wonder if there is an option to turn that off? > I have tried including > > -cuda_device NONE > > in command options, but that did not work. > > Best regards, > Yiyang > From varunhiremath at gmail.com Tue Sep 28 00:50:56 2021 From: varunhiremath at gmail.com (Varun Hiremath) Date: Mon, 27 Sep 2021 22:50:56 -0700 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> <4BB88AB3-410E-493C-9161-97775747936D@dsic.upv.es> <32B34038-7E1A-42CA-A55D-9AF9D41D1697@dsic.upv.es> Message-ID: Hi Jose, I implemented the LU factorized preconditioner and tested it using PREONLY + LU, but that actually is converging to the wrong eigenvalues, compared to just using BICGS + BJACOBI, or simply computing EPS_SMALLEST_MAGNITUDE without any preconditioning. My preconditioning matrix is only a 1st order approximation, and the off-diagonal terms are not very accurate, so I'm guessing this is why the LU factorization doesn't help much? Nonetheless, using BICGS + BJACOBI with slightly relaxed tolerances seems to be working fine. I now want to test the same preconditioning idea for a quadratic problem. I am solving a quadratic equation similar to Eqn.(5.1) in the SLEPc manual: (K + lambda*C + lambda^2*M)*x = 0, I don't use the PEP package directly, but solve this by linearizing similar to Eqn.(5.3) and calling EPS. Without explicitly forming the full matrix, I just use the block matrix structure as explained in the below example and that works nicely for my case: https://slepc.upv.es/documentation/current/src/eps/tutorials/ex9.c.html In my case, K is not explicitly known, and for linear problems, where C = 0, I am using a 1st order approximation of K as the preconditioner. Now could you please tell me if there is a way to conveniently set the preconditioner for the quadratic problem, which will be of the form [-K 0; 0 I]? Note that K is constructed in parallel (the rows are distributed), so I wasn't sure how to construct this preconditioner matrix which will be compatible with the shell matrix structure that I'm using to define the MatMult function as in ex9. Thanks, Varun On Fri, Sep 24, 2021 at 11:50 PM Varun Hiremath wrote: > Ok, great! I will give that a try, thanks for your help! > > On Fri, Sep 24, 2021 at 11:12 PM Jose E. Roman wrote: > >> Yes, you can use PCMAT >> https://petsc.org/release/docs/manualpages/PC/PCMAT.html then pass a >> preconditioner matrix that performs the inverse via a shell matrix. >> >> > El 25 sept 2021, a las 8:07, Varun Hiremath >> escribi?: >> > >> > Hi Jose, >> > >> > Thanks for checking my code and providing suggestions. >> > >> > In my particular case, I don't know the matrix A explicitly, I compute >> A*x in a matrix-free way within a shell matrix, so I can't use any of the >> direct factorization methods. But just a question regarding your suggestion >> to compute a (parallel) LU factorization. In our work, we do use MUMPS to >> compute the parallel factorization. For solving the generalized problem, >> A*x = lambda*B*x, we are computing inv(B)*A*x within a shell matrix, where >> factorization of B is computed using MUMPS. (We don't call MUMPS through >> SLEPc as we have our own MPI wrapper and other user settings to handle.) >> > >> > So for the preconditioning, instead of using the iterative solvers, can >> I provide a shell matrix that computes inv(P)*x corrections (where P is the >> preconditioner matrix) using MUMPS direct solver? >> > >> > And yes, thanks, #define PETSC_USE_COMPLEX 1 is not needed, it works >> without it. >> > >> > Regards, >> > Varun >> > >> > On Fri, Sep 24, 2021 at 9:14 AM Jose E. Roman >> wrote: >> > If you do >> > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 >> > then it is using an LU factorization (the default), which is fast. >> > >> > Use -eps_view to see which solver settings are you using. >> > >> > BiCGStab with block Jacobi does not work for you matrix, it exceeds the >> maximum 10000 iterations. So this is not viable unless you can find a >> better preconditioner for your problem. If not, just using >> EPS_SMALLEST_MAGNITUDE will be faster. >> > >> > Computing smallest magnitude eigenvalues is a difficult task. The most >> robust way is to compute a (parallel) LU factorization if you can afford it. >> > >> > >> > A side note: don't add this to your source code >> > #define PETSC_USE_COMPLEX 1 >> > This define is taken from PETSc's include files, you should not mess >> with it. Instead, you probably want to add something like this AFTER >> #include : >> > #if !defined(PETSC_USE_COMPLEX) >> > #error "Requires complex scalars" >> > #endif >> > >> > Jose >> > >> > >> > > El 22 sept 2021, a las 19:38, Varun Hiremath >> escribi?: >> > > >> > > Hi Jose, >> > > >> > > Thank you, that explains it and my example code works now without >> specifying "-eps_target 0" in the command line. >> > > >> > > However, both the Krylov inexact shift-invert and JD solvers are >> struggling to converge for some of my actual problems. The issue seems to >> be related to non-symmetric general matrices. I have extracted one such >> matrix attached here as MatA.gz (size 100k), and have also included a short >> program that loads this matrix and then computes the smallest eigenvalues >> as I described earlier. >> > > >> > > For this matrix, if I compute the eigenvalues directly (without using >> the shell matrix) using shift-and-invert (as below) then it converges in >> less than a minute. >> > > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 >> > > >> > > However, if I use the shell matrix and use any of the preconditioned >> solvers JD or Krylov shift-invert (as shown below) with the same matrix as >> the preconditioner, then they struggle to converge. >> > > $ ./acoustic_matrix_test.o -usejd 1 -deflate 1 >> > > $ ./acoustic_matrix_test.o -sinvert 1 -deflate 1 >> > > >> > > Could you please check the attached code and suggest any changes in >> settings that might help with convergence for these kinds of matrices? I >> appreciate your help! >> > > >> > > Thanks, >> > > Varun >> > > >> > > On Tue, Sep 21, 2021 at 11:14 AM Jose E. Roman >> wrote: >> > > I will have a look at your code when I have more time. Meanwhile, I >> am answering 3) below... >> > > >> > > > El 21 sept 2021, a las 0:23, Varun Hiremath < >> varunhiremath at gmail.com> escribi?: >> > > > >> > > > Hi Jose, >> > > > >> > > > Sorry, it took me a while to test these settings in the new builds. >> I am getting good improvement in performance using the preconditioned >> solvers, so thanks for the suggestions! But I have some questions related >> to the usage. >> > > > >> > > > We are using SLEPc to solve the acoustic modal eigenvalue problem. >> Attached is a simple standalone program that computes acoustic modes in a >> simple rectangular box. This program illustrates the general setup I am >> using, though here the shell matrix and the preconditioner matrix are the >> same, while in my actual program the shell matrix computes A*x without >> explicitly forming A, and the preconditioner is a 0th order approximation >> of A. >> > > > >> > > > In the attached program I have tested both >> > > > 1) the Krylov-Schur with inexact shift-and-invert (implemented >> under the option sinvert); >> > > > 2) the JD solver with preconditioner (implemented under the option >> usejd) >> > > > >> > > > Both the solvers seem to work decently, compared to no >> preconditioning. This is how I run the two solvers (for a mesh size of >> 1600x400): >> > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 >> -eps_target 0 >> > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 >> -eps_target 0 >> > > > Both finish in about ~10 minutes on my system in serial. JD seems >> to be slightly faster and more accurate (for the imaginary part of >> eigenvalue). >> > > > The program also runs in parallel using mpiexec. I use complex >> builds, as in my main program the matrix can be complex. >> > > > >> > > > Now here are my questions: >> > > > 1) For this particular problem type, could you please check if >> these are the best settings that one could use? I have tried different >> combinations of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI >> seems to work the best in serial and parallel. >> > > > >> > > > 2) When I tested these settings in my main program, for some reason >> the JD solver was not converging. After further testing, I found the issue >> was related to the setting of "-eps_target 0". I have included >> "EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to >> passing "-eps_target 0" from the command line, but that doesn't seem to be >> the case. For instance, if I run the attached program without "-eps_target >> 0" in the command line then it doesn't converge. >> > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 >> -eps_target 0 >> > > > the above finishes in about 10 minutes >> > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 >> > > > the above doesn't converge even though "EPSSetTarget(eps,0.0);" is >> included in the code >> > > > >> > > > This only seems to affect the JD solver, not the Krylov >> shift-and-invert (-sinvert 1) option. So is there any difference between >> passing "-eps_target 0" from the command line vs using >> "EPSSetTarget(eps,0.0);" in the code? I cannot pass any command line >> arguments in my actual program, so need to set everything internally. >> > > > >> > > > 3) Also, another minor related issue. While using the inexact >> shift-and-invert option, I was running into the following error: >> > > > >> > > > "" >> > > > Missing or incorrect user input >> > > > Shift-and-invert requires a target 'which' (see >> EPSSetWhichEigenpairs), for instance -st_type sinvert -eps_target 0 >> -eps_target_magnitude >> > > > "" >> > > > >> > > > I already have the below two lines in the code: >> > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); >> > > > EPSSetTarget(eps,0.0); >> > > > >> > > > so shouldn't these be enough? If I comment out the first line >> "EPSSetWhichEigenpairs", then the code works fine. >> > > >> > > You should either do >> > > >> > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); >> > > >> > > without shift-and-invert or >> > > >> > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); >> > > EPSSetTarget(eps,0.0); >> > > >> > > with shift-and-invert. The latter can also be used without >> shift-and-invert (e.g. in JD). >> > > >> > > I have to check, but a possible explanation why in your comment above >> (2) the command-line option -eps_target 0 works differently is that it also >> sets -eps_target_magnitude if omitted, so to be equivalent in source code >> you have to call both >> > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); >> > > EPSSetTarget(eps,0.0); >> > > >> > > Jose >> > > >> > > > I have some more questions regarding setting the preconditioner for >> a quadratic eigenvalue problem, which I will ask in a follow-up email. >> > > > >> > > > Thanks for your help! >> > > > >> > > > -Varun >> > > > >> > > > >> > > > On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath < >> varunhiremath at gmail.com> wrote: >> > > > Thank you very much for these suggestions! We are currently using >> version 3.12, so I'll try to update to the latest version and try your >> suggestions. Let me get back to you, thanks! >> > > > >> > > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman >> wrote: >> > > > Then I would try Davidson methods https://doi.org/10.1145/2543696 >> > > > You can also try Krylov-Schur with "inexact" shift-and-invert, for >> instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the >> users manual. >> > > > >> > > > In both cases, you have to pass matrix A in the call to >> EPSSetOperators() and the preconditioner matrix via >> STSetPreconditionerMat() - note this function was introduced in version >> 3.15. >> > > > >> > > > Jose >> > > > >> > > > >> > > > >> > > > > El 1 jul 2021, a las 13:36, Varun Hiremath < >> varunhiremath at gmail.com> escribi?: >> > > > > >> > > > > Thanks. I actually do have a 1st order approximation of matrix A, >> that I can explicitly compute and also invert. Can I use that matrix as >> preconditioner to speed things up? Is there some example that explains how >> to setup and call SLEPc for this scenario? >> > > > > >> > > > > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman >> wrote: >> > > > > For smallest real parts one could adapt ex34.c, but it is going >> to be costly >> https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html >> > > > > Also, if eigenvalues are clustered around the origin, convergence >> may still be very slow. >> > > > > >> > > > > It is a tough problem, unless you are able to compute a good >> preconditioner of A (no need to compute the exact inverse). >> > > > > >> > > > > Jose >> > > > > >> > > > > >> > > > > > El 1 jul 2021, a las 13:23, Varun Hiremath < >> varunhiremath at gmail.com> escribi?: >> > > > > > >> > > > > > I'm solving for the smallest eigenvalues in magnitude. Though >> is it cheaper to solve smallest in real part, as that might also work in my >> case? Thanks for your help. >> > > > > > >> > > > > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman >> wrote: >> > > > > > Smallest eigenvalue in magnitude or real part? >> > > > > > >> > > > > > >> > > > > > > El 1 jul 2021, a las 11:58, Varun Hiremath < >> varunhiremath at gmail.com> escribi?: >> > > > > > > >> > > > > > > Sorry, no both A and B are general sparse matrices >> (non-hermitian). So is there anything else I could try? >> > > > > > > >> > > > > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman < >> jroman at dsic.upv.es> wrote: >> > > > > > > Is the problem symmetric (GHEP)? In that case, you can try >> LOBPCG on the pair (A,B). But this will likely be slow as well, unless you >> can provide a good preconditioner. >> > > > > > > >> > > > > > > Jose >> > > > > > > >> > > > > > > >> > > > > > > > El 1 jul 2021, a las 11:37, Varun Hiremath < >> varunhiremath at gmail.com> escribi?: >> > > > > > > > >> > > > > > > > Hi All, >> > > > > > > > >> > > > > > > > I am trying to compute the smallest eigenvalues of a >> generalized system A*x= lambda*B*x. I don't explicitly know the matrix A >> (so I am using a shell matrix with a custom matmult function) however, the >> matrix B is explicitly known so I compute inv(B)*A within the shell matrix >> and solve inv(B)*A*x = lambda*x. >> > > > > > > > >> > > > > > > > To compute the smallest eigenvalues it is recommended to >> solve the inverted system, but since matrix A is not explicitly known I >> can't invert the system. Moreover, the size of the system can be really >> big, and with the default Krylov solver, it is extremely slow. So is there >> a better way for me to compute the smallest eigenvalues of this system? >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > Varun >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > >> > > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.tardieu at edf.fr Tue Sep 28 07:34:23 2021 From: nicolas.tardieu at edf.fr (TARDIEU Nicolas) Date: Tue, 28 Sep 2021 12:34:23 +0000 Subject: [petsc-users] GMRES Breakdown Message-ID: <1632832463882.43878@edf.fr> Dear PETSc Team, We, code_aster's development team, are using PETSc in our application for years now, mainly for the KSP et PC features. We run tests cases every week without problems. After upgrading from PETSc 3.12.3 to 3.15.3, a test case using GMRES failed with KSP_DIVERGED_BREAKDOWN. By rolling back to 3.12.3, we have checked that the upgrade is the origin of the failure. We know this can occur with GMRES but it is the first time we are facing this situation. Is it a bug ? If yes, how can we help to fix it ? If no, what can we do ? Could the flexible version of GMRES be an alternative ? Best regards, Nicolas -- Nicolas Tardieu Ing, PhD Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont ?tablis ? l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme ? sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions ?galement d'en avertir imm?diatement l'exp?diteur par retour du message. Il est impossible de garantir que les communications par messagerie ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute erreur ou virus. ____________________________________________________ This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Sep 28 07:45:28 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 28 Sep 2021 14:45:28 +0200 Subject: [petsc-users] GMRES Breakdown In-Reply-To: <1632832463882.43878@edf.fr> References: <1632832463882.43878@edf.fr> Message-ID: <1EA10A4B-9FBC-460A-9FBC-F01CF9D506D1@dsic.upv.es> Now there is a breakdown tolerance https://petsc.org/main/docs/manualpages/KSP/KSPGMRESSetBreakdownTolerance.html You can try changing it. Generally, when upgrading you should check the list of changes https://petsc.org/release/docs/changes/ Jose > El 28 sept 2021, a las 14:34, TARDIEU Nicolas via petsc-users escribi?: > > Dear PETSc Team, > > We, code_aster's development team, are using PETSc in our application for years now, mainly for the KSP et PC features. We run tests cases every week without problems. > After upgrading from PETSc 3.12.3 to 3.15.3, a test case using GMRES failed with KSP_DIVERGED_BREAKDOWN. By rolling back to 3.12.3, we have checked that the upgrade is the origin of the failure. > > We know this can occur with GMRES but it is the first time we are facing this situation. > Is it a bug ? If yes, how can we help to fix it ? If no, what can we do ? Could the flexible version of GMRES be an alternative ? > > Best regards, > Nicolas > -- > Nicolas Tardieu > Ing, PhD > > Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont ?tablis ? l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme ? sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. > Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions ?galement d'en avertir imm?diatement l'exp?diteur par retour du message. > Il est impossible de garantir que les communications par messagerie ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute erreur ou virus. > ____________________________________________________ > This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. > If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. > E-mail communication cannot be guaranteed to be timely secure, error or virus-free. From knepley at gmail.com Tue Sep 28 07:54:11 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 28 Sep 2021 08:54:11 -0400 Subject: [petsc-users] GMRES Breakdown In-Reply-To: <1632832463882.43878@edf.fr> References: <1632832463882.43878@edf.fr> Message-ID: On Tue, Sep 28, 2021 at 8:39 AM TARDIEU Nicolas via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear PETSc Team, > > We, code_aster's development team, are using PETSc in our application for > years now, mainly for the KSP et PC features. We run tests cases every week > without problems. > After upgrading from PETSc 3.12.3 to 3.15.3, a test case using GMRES > failed with KSP_DIVERGED_BREAKDOWN. By rolling back to 3.12.3, we have > checked that the upgrade is the origin of the failure. > > We know this can occur with GMRES but it is the first time we are facing > this situation. > > Is it a bug ? If yes, how can we help to fix it ? If no, what can we do ? > Could the flexible version of GMRES be an alternative ? > This is not a bug, but we did change the operation. We now check whether there is a significant residual spike after a restart. We want to let the user know that this likely means that the solver is inappropriate for the problem. As Jose notes, you can turn off this behavior. Thanks, Matt > Best regards, > Nicolas > -- > *Nicolas Tardieu* > *Ing, PhD* > > > Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont > ?tablis ? l'intention exclusive des destinataires et les informations qui y > figurent sont strictement confidentielles. Toute utilisation de ce Message > non conforme ? sa destination, toute diffusion ou toute publication totale > ou partielle, est interdite sauf autorisation expresse. > > Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de > le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou > partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de > votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace > sur quelque support que ce soit. Nous vous remercions ?galement d'en > avertir imm?diatement l'exp?diteur par retour du message. > > Il est impossible de garantir que les communications par messagerie > ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute > erreur ou virus. > ____________________________________________________ > > This message and any attachments (the 'Message') are intended solely for > the addressees. The information contained in this Message is confidential. > Any use of information contained in this Message not in accord with its > purpose, any dissemination or disclosure, either whole or partial, is > prohibited except formal approval. > > If you are not the addressee, you may not copy, forward, disclose or use > any part of it. If you have received this message in error, please delete > it and all copies from your system and notify the sender immediately by > return message. > > E-mail communication cannot be guaranteed to be timely secure, error or > virus-free. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.tardieu at edf.fr Tue Sep 28 08:29:06 2021 From: nicolas.tardieu at edf.fr (TARDIEU Nicolas) Date: Tue, 28 Sep 2021 13:29:06 +0000 Subject: [petsc-users] GMRES Breakdown In-Reply-To: References: <1632832463882.43878@edf.fr>, Message-ID: <1632835746671.87455@edf.fr> Dear Jose and Matt, I thank you very much for your super-fast answers. And I apologize for not having checked the list of changes. Best regards, Nicolas -- Nicolas Tardieu Ing?nieur Chercheur Groupe Dynamique des Equipements - T6B EDF - R&D Dpt ERMES nicolas.tardieu at edf.fr T?l. : 01 78 19 37 49 ________________________________ De : knepley at gmail.com Envoy? : mardi 28 septembre 2021 14:54 ? : TARDIEU Nicolas Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] GMRES Breakdown On Tue, Sep 28, 2021 at 8:39 AM TARDIEU Nicolas via petsc-users > wrote: Dear PETSc Team, We, code_aster's development team, are using PETSc in our application for years now, mainly for the KSP et PC features. We run tests cases every week without problems. After upgrading from PETSc 3.12.3 to 3.15.3, a test case using GMRES failed with KSP_DIVERGED_BREAKDOWN. By rolling back to 3.12.3, we have checked that the upgrade is the origin of the failure. We know this can occur with GMRES but it is the first time we are facing this situation. Is it a bug ? If yes, how can we help to fix it ? If no, what can we do ? Could the flexible version of GMRES be an alternative ? This is not a bug, but we did change the operation. We now check whether there is a significant residual spike after a restart. We want to let the user know that this likely means that the solver is inappropriate for the problem. As Jose notes, you can turn off this behavior. Thanks, Matt Best regards, Nicolas -- Nicolas Tardieu Ing, PhD Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont ?tablis ? l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme ? sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions ?galement d'en avertir imm?diatement l'exp?diteur par retour du message. Il est impossible de garantir que les communications par messagerie ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute erreur ou virus. ____________________________________________________ This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ Ce message et toutes les pi?ces jointes (ci-apr?s le 'Message') sont ?tablis ? l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme ? sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'?tes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez re?u ce Message par erreur, merci de le supprimer de votre syst?me, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions ?galement d'en avertir imm?diatement l'exp?diteur par retour du message. Il est impossible de garantir que les communications par messagerie ?lectronique arrivent en temps utile, sont s?curis?es ou d?nu?es de toute erreur ou virus. ____________________________________________________ This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free. -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Tue Sep 28 09:55:52 2021 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Tue, 28 Sep 2021 14:55:52 +0000 Subject: [petsc-users] %T (percent time in this phase) Message-ID: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> Hello, I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. Thanks! Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ksp_ex45_N511_cpu_6.txt URL: From jroman at dsic.upv.es Tue Sep 28 10:09:37 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 28 Sep 2021 17:09:37 +0200 Subject: [petsc-users] SLEPc: smallest eigenvalues In-Reply-To: References: <179BDB69-1EC0-4334-A964-ABE29E33EFF8@dsic.upv.es> <5B1750B3-E05F-45D7-929B-A5CF816B4A75@dsic.upv.es> <7031EC8B-A238-45AD-B4C2-FA8988022864@dsic.upv.es> <6B968AE2-8325-4E20-B94A-16ECDD0FBA90@dsic.upv.es> <4BB88AB3-410E-493C-9161-97775747936D@dsic.upv.es> <32B34038-7E1A-42CA-A55D-9AF9D41D1697@dsic.upv.es> Message-ID: <4FC17DE7-B910-43D8-9EC5-816285FD52F4@dsic.upv.es> > El 28 sept 2021, a las 7:50, Varun Hiremath escribi?: > > Hi Jose, > > I implemented the LU factorized preconditioner and tested it using PREONLY + LU, but that actually is converging to the wrong eigenvalues, compared to just using BICGS + BJACOBI, or simply computing EPS_SMALLEST_MAGNITUDE without any preconditioning. My preconditioning matrix is only a 1st order approximation, and the off-diagonal terms are not very accurate, so I'm guessing this is why the LU factorization doesn't help much? Nonetheless, using BICGS + BJACOBI with slightly relaxed tolerances seems to be working fine. If your PCMAT is not an exact inverse, then you have to iterate, i.e. not use KSPPREONLY but KSPBCGS or another. > > I now want to test the same preconditioning idea for a quadratic problem. I am solving a quadratic equation similar to Eqn.(5.1) in the SLEPc manual: > (K + lambda*C + lambda^2*M)*x = 0, > I don't use the PEP package directly, but solve this by linearizing similar to Eqn.(5.3) and calling EPS. Without explicitly forming the full matrix, I just use the block matrix structure as explained in the below example and that works nicely for my case: > https://slepc.upv.es/documentation/current/src/eps/tutorials/ex9.c.html Using PEP is generally recommended. The default solver TOAR is memory-efficient and performs less computation than a trivial linearization. In addition, PEP allows you to do scaling, which is often very important to get accurate results in some problems, depending on conditioning. In your case K is a shell matrix, so things may not be trivial. If I am not wrong, you should be able to use STSetPreconditionerMat() for a PEP, where the preconditioner in this case should be built to approximate Q(sigma), where Q(.) is the quadratic polynomial and sigma is the target. > > In my case, K is not explicitly known, and for linear problems, where C = 0, I am using a 1st order approximation of K as the preconditioner. Now could you please tell me if there is a way to conveniently set the preconditioner for the quadratic problem, which will be of the form [-K 0; 0 I]? Note that K is constructed in parallel (the rows are distributed), so I wasn't sure how to construct this preconditioner matrix which will be compatible with the shell matrix structure that I'm using to define the MatMult function as in ex9. The shell matrix of ex9.c interleaves the local parts of the first block and the second block. In other words, a process' local part consists of the local rows of the first block followed by the local rows of the second block. In your case, the local rows of K followed by the local rows of the identity (appropriately padded with zeros). Jose > > Thanks, > Varun > > On Fri, Sep 24, 2021 at 11:50 PM Varun Hiremath wrote: > Ok, great! I will give that a try, thanks for your help! > > On Fri, Sep 24, 2021 at 11:12 PM Jose E. Roman wrote: > Yes, you can use PCMAT https://petsc.org/release/docs/manualpages/PC/PCMAT.html then pass a preconditioner matrix that performs the inverse via a shell matrix. > > > El 25 sept 2021, a las 8:07, Varun Hiremath escribi?: > > > > Hi Jose, > > > > Thanks for checking my code and providing suggestions. > > > > In my particular case, I don't know the matrix A explicitly, I compute A*x in a matrix-free way within a shell matrix, so I can't use any of the direct factorization methods. But just a question regarding your suggestion to compute a (parallel) LU factorization. In our work, we do use MUMPS to compute the parallel factorization. For solving the generalized problem, A*x = lambda*B*x, we are computing inv(B)*A*x within a shell matrix, where factorization of B is computed using MUMPS. (We don't call MUMPS through SLEPc as we have our own MPI wrapper and other user settings to handle.) > > > > So for the preconditioning, instead of using the iterative solvers, can I provide a shell matrix that computes inv(P)*x corrections (where P is the preconditioner matrix) using MUMPS direct solver? > > > > And yes, thanks, #define PETSC_USE_COMPLEX 1 is not needed, it works without it. > > > > Regards, > > Varun > > > > On Fri, Sep 24, 2021 at 9:14 AM Jose E. Roman wrote: > > If you do > > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > > then it is using an LU factorization (the default), which is fast. > > > > Use -eps_view to see which solver settings are you using. > > > > BiCGStab with block Jacobi does not work for you matrix, it exceeds the maximum 10000 iterations. So this is not viable unless you can find a better preconditioner for your problem. If not, just using EPS_SMALLEST_MAGNITUDE will be faster. > > > > Computing smallest magnitude eigenvalues is a difficult task. The most robust way is to compute a (parallel) LU factorization if you can afford it. > > > > > > A side note: don't add this to your source code > > #define PETSC_USE_COMPLEX 1 > > This define is taken from PETSc's include files, you should not mess with it. Instead, you probably want to add something like this AFTER #include : > > #if !defined(PETSC_USE_COMPLEX) > > #error "Requires complex scalars" > > #endif > > > > Jose > > > > > > > El 22 sept 2021, a las 19:38, Varun Hiremath escribi?: > > > > > > Hi Jose, > > > > > > Thank you, that explains it and my example code works now without specifying "-eps_target 0" in the command line. > > > > > > However, both the Krylov inexact shift-invert and JD solvers are struggling to converge for some of my actual problems. The issue seems to be related to non-symmetric general matrices. I have extracted one such matrix attached here as MatA.gz (size 100k), and have also included a short program that loads this matrix and then computes the smallest eigenvalues as I described earlier. > > > > > > For this matrix, if I compute the eigenvalues directly (without using the shell matrix) using shift-and-invert (as below) then it converges in less than a minute. > > > $ ./acoustic_matrix_test.o -shell 0 -st_type sinvert -deflate 1 > > > > > > However, if I use the shell matrix and use any of the preconditioned solvers JD or Krylov shift-invert (as shown below) with the same matrix as the preconditioner, then they struggle to converge. > > > $ ./acoustic_matrix_test.o -usejd 1 -deflate 1 > > > $ ./acoustic_matrix_test.o -sinvert 1 -deflate 1 > > > > > > Could you please check the attached code and suggest any changes in settings that might help with convergence for these kinds of matrices? I appreciate your help! > > > > > > Thanks, > > > Varun > > > > > > On Tue, Sep 21, 2021 at 11:14 AM Jose E. Roman wrote: > > > I will have a look at your code when I have more time. Meanwhile, I am answering 3) below... > > > > > > > El 21 sept 2021, a las 0:23, Varun Hiremath escribi?: > > > > > > > > Hi Jose, > > > > > > > > Sorry, it took me a while to test these settings in the new builds. I am getting good improvement in performance using the preconditioned solvers, so thanks for the suggestions! But I have some questions related to the usage. > > > > > > > > We are using SLEPc to solve the acoustic modal eigenvalue problem. Attached is a simple standalone program that computes acoustic modes in a simple rectangular box. This program illustrates the general setup I am using, though here the shell matrix and the preconditioner matrix are the same, while in my actual program the shell matrix computes A*x without explicitly forming A, and the preconditioner is a 0th order approximation of A. > > > > > > > > In the attached program I have tested both > > > > 1) the Krylov-Schur with inexact shift-and-invert (implemented under the option sinvert); > > > > 2) the JD solver with preconditioner (implemented under the option usejd) > > > > > > > > Both the solvers seem to work decently, compared to no preconditioning. This is how I run the two solvers (for a mesh size of 1600x400): > > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 > > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -sinvert 1 -deflate 1 -eps_target 0 > > > > Both finish in about ~10 minutes on my system in serial. JD seems to be slightly faster and more accurate (for the imaginary part of eigenvalue). > > > > The program also runs in parallel using mpiexec. I use complex builds, as in my main program the matrix can be complex. > > > > > > > > Now here are my questions: > > > > 1) For this particular problem type, could you please check if these are the best settings that one could use? I have tried different combinations of KSP/PC types e.g. GMRES, GAMG, etc, but BCGSL + BJACOBI seems to work the best in serial and parallel. > > > > > > > > 2) When I tested these settings in my main program, for some reason the JD solver was not converging. After further testing, I found the issue was related to the setting of "-eps_target 0". I have included "EPSSetTarget(eps,0.0);" in the program and I assumed this is equivalent to passing "-eps_target 0" from the command line, but that doesn't seem to be the case. For instance, if I run the attached program without "-eps_target 0" in the command line then it doesn't converge. > > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 -eps_target 0 > > > > the above finishes in about 10 minutes > > > > $ ./acoustic_box_test.o -nx 1600 -ny 400 -usejd 1 -deflate 1 > > > > the above doesn't converge even though "EPSSetTarget(eps,0.0);" is included in the code > > > > > > > > This only seems to affect the JD solver, not the Krylov shift-and-invert (-sinvert 1) option. So is there any difference between passing "-eps_target 0" from the command line vs using "EPSSetTarget(eps,0.0);" in the code? I cannot pass any command line arguments in my actual program, so need to set everything internally. > > > > > > > > 3) Also, another minor related issue. While using the inexact shift-and-invert option, I was running into the following error: > > > > > > > > "" > > > > Missing or incorrect user input > > > > Shift-and-invert requires a target 'which' (see EPSSetWhichEigenpairs), for instance -st_type sinvert -eps_target 0 -eps_target_magnitude > > > > "" > > > > > > > > I already have the below two lines in the code: > > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > > > EPSSetTarget(eps,0.0); > > > > > > > > so shouldn't these be enough? If I comment out the first line "EPSSetWhichEigenpairs", then the code works fine. > > > > > > You should either do > > > > > > EPSSetWhichEigenpairs(eps,EPS_SMALLEST_MAGNITUDE); > > > > > > without shift-and-invert or > > > > > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > > > EPSSetTarget(eps,0.0); > > > > > > with shift-and-invert. The latter can also be used without shift-and-invert (e.g. in JD). > > > > > > I have to check, but a possible explanation why in your comment above (2) the command-line option -eps_target 0 works differently is that it also sets -eps_target_magnitude if omitted, so to be equivalent in source code you have to call both > > > EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE); > > > EPSSetTarget(eps,0.0); > > > > > > Jose > > > > > > > I have some more questions regarding setting the preconditioner for a quadratic eigenvalue problem, which I will ask in a follow-up email. > > > > > > > > Thanks for your help! > > > > > > > > -Varun > > > > > > > > > > > > On Thu, Jul 1, 2021 at 5:01 AM Varun Hiremath wrote: > > > > Thank you very much for these suggestions! We are currently using version 3.12, so I'll try to update to the latest version and try your suggestions. Let me get back to you, thanks! > > > > > > > > On Thu, Jul 1, 2021, 4:45 AM Jose E. Roman wrote: > > > > Then I would try Davidson methods https://doi.org/10.1145/2543696 > > > > You can also try Krylov-Schur with "inexact" shift-and-invert, for instance, with preconditioned BiCGStab or GMRES, see section 3.4.1 of the users manual. > > > > > > > > In both cases, you have to pass matrix A in the call to EPSSetOperators() and the preconditioner matrix via STSetPreconditionerMat() - note this function was introduced in version 3.15. > > > > > > > > Jose > > > > > > > > > > > > > > > > > El 1 jul 2021, a las 13:36, Varun Hiremath escribi?: > > > > > > > > > > Thanks. I actually do have a 1st order approximation of matrix A, that I can explicitly compute and also invert. Can I use that matrix as preconditioner to speed things up? Is there some example that explains how to setup and call SLEPc for this scenario? > > > > > > > > > > On Thu, Jul 1, 2021, 4:29 AM Jose E. Roman wrote: > > > > > For smallest real parts one could adapt ex34.c, but it is going to be costly https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html > > > > > Also, if eigenvalues are clustered around the origin, convergence may still be very slow. > > > > > > > > > > It is a tough problem, unless you are able to compute a good preconditioner of A (no need to compute the exact inverse). > > > > > > > > > > Jose > > > > > > > > > > > > > > > > El 1 jul 2021, a las 13:23, Varun Hiremath escribi?: > > > > > > > > > > > > I'm solving for the smallest eigenvalues in magnitude. Though is it cheaper to solve smallest in real part, as that might also work in my case? Thanks for your help. > > > > > > > > > > > > On Thu, Jul 1, 2021, 4:08 AM Jose E. Roman wrote: > > > > > > Smallest eigenvalue in magnitude or real part? > > > > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:58, Varun Hiremath escribi?: > > > > > > > > > > > > > > Sorry, no both A and B are general sparse matrices (non-hermitian). So is there anything else I could try? > > > > > > > > > > > > > > On Thu, Jul 1, 2021 at 2:43 AM Jose E. Roman wrote: > > > > > > > Is the problem symmetric (GHEP)? In that case, you can try LOBPCG on the pair (A,B). But this will likely be slow as well, unless you can provide a good preconditioner. > > > > > > > > > > > > > > Jose > > > > > > > > > > > > > > > > > > > > > > El 1 jul 2021, a las 11:37, Varun Hiremath escribi?: > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > I am trying to compute the smallest eigenvalues of a generalized system A*x= lambda*B*x. I don't explicitly know the matrix A (so I am using a shell matrix with a custom matmult function) however, the matrix B is explicitly known so I compute inv(B)*A within the shell matrix and solve inv(B)*A*x = lambda*x. > > > > > > > > > > > > > > > > To compute the smallest eigenvalues it is recommended to solve the inverted system, but since matrix A is not explicitly known I can't invert the system. Moreover, the size of the system can be really big, and with the default Krylov solver, it is extremely slow. So is there a better way for me to compute the smallest eigenvalues of this system? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Varun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From bantingl at myumanitoba.ca Tue Sep 28 10:33:37 2021 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Tue, 28 Sep 2021 15:33:37 +0000 Subject: [petsc-users] Using MATLAB on Windows with PETSc on WSL Message-ID: Hello, My overall goal is to send a sparse matrix to PETSc (in WSL)from MATLAB (in Windows) so I can use SLEPc for some eigenvalue routines, and send those eigenvectors back to MATLAB, as the MATLAB eigs() struggles with my matrix and I was looking to experiment with different eigenvalue algorithms. I was trying to configure PETSc on Windows Subsystem for Linux (WSL2). Configuring without '--with-matlab' works fine. I tried the configure command: ./configure --with-scalar-type=complex --with-openblas-dir=~/software/OpenBLAS/ --with-matlab-dir=/mnt/e/Program\ Files/MATLAB/R2020b I was wondering if there is a fundamental reason this configure won't work, or if it is just the space in 'Program Files' that is breaking the configure command. I think the only thing configuring with MATLAB is 'sopen' and 'sclose'? Is it possible I could just remake these on my own for a WSL compatible version? Thanks, Lucas -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Sep 28 10:48:03 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 28 Sep 2021 11:48:03 -0400 Subject: [petsc-users] Using MATLAB on Windows with PETSc on WSL In-Reply-To: References: Message-ID: <23C2EF26-9EC8-42BA-A731-049221A3ECF5@petsc.dev> > On Sep 28, 2021, at 11:33 AM, Lucas Banting wrote: > > Hello, > > My overall goal is to send a sparse matrix to PETSc (in WSL)from MATLAB (in Windows) so I can use SLEPc for some eigenvalue routines, and send those eigenvectors back to MATLAB, as the MATLAB eigs() struggles with my matrix and I was looking to experiment with different eigenvalue algorithms. > > I was trying to configure PETSc on Windows Subsystem for Linux (WSL2). Configuring without '--with-matlab' works fine. I tried the configure command: > > ./configure --with-scalar-type=complex --with-openblas-dir=~/software/OpenBLAS/ > --with-matlab-dir=/mnt/e/Program\ Files/MATLAB/R2020b > > I was wondering if there is a fundamental reason this configure won't work, or if it is just the space in 'Program Files' that is breaking the configure command. We cannot tell without the configure.log file. Note you may be able to use the shorten MS DOS directory names which do not have spaces for that directory? > > I think the only thing configuring with MATLAB is 'sopen' and 'sclose'? Yes > Is it possible I could just remake these on my own for a WSL compatible version? The building of sopen and sclose is something you can do directly by adjusting the makefile by hand to link the correct files. > > Thanks, > > Lucas -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Sep 28 10:56:11 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 28 Sep 2021 11:56:11 -0400 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> Message-ID: <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Hello, > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. > For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. Barry > Thanks! > Karthik. > > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Sep 28 10:58:40 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 28 Sep 2021 10:58:40 -0500 (CDT) Subject: [petsc-users] Using MATLAB on Windows with PETSc on WSL In-Reply-To: <23C2EF26-9EC8-42BA-A731-049221A3ECF5@petsc.dev> References: <23C2EF26-9EC8-42BA-A731-049221A3ECF5@petsc.dev> Message-ID: <405a4beb-afa-9673-e7d3-72c845981e18@mcs.anl.gov> Well matlab as you say is on windows side - and WSL is basically linux. One can invoke binaries from the other side - but obj-files/libraries [wrt compilers, linking] won't work as far as I know. And the MEX targets require compilers/liners to be functional - so I don't think this will work. And we don't have experience with using matlab natively on windows anyway.. Satish ---- sread: -@${MATLAB_MEX} -g GCC='${CC}' CC='${PCC}' CFLAGS='${COPTFLAGS} ${CC_FLAGS} ${CCPPFLAGS}' LDFLAGS='${PETSC_EXTERNAL_LIB_BASIC}' sread.c bread.c -@${RM} -f sread.o bread.o -@${MV} sread.mex* ${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/matlab On Tue, 28 Sep 2021, Barry Smith wrote: > > > > On Sep 28, 2021, at 11:33 AM, Lucas Banting wrote: > > > > Hello, > > > > My overall goal is to send a sparse matrix to PETSc (in WSL)from MATLAB (in Windows) so I can use SLEPc for some eigenvalue routines, and send those eigenvectors back to MATLAB, as the MATLAB eigs() struggles with my matrix and I was looking to experiment with different eigenvalue algorithms. > > > > I was trying to configure PETSc on Windows Subsystem for Linux (WSL2). Configuring without '--with-matlab' works fine. I tried the configure command: > > > > ./configure --with-scalar-type=complex --with-openblas-dir=~/software/OpenBLAS/ > > --with-matlab-dir=/mnt/e/Program\ Files/MATLAB/R2020b > > > > I was wondering if there is a fundamental reason this configure won't work, or if it is just the space in 'Program Files' that is breaking the configure command. > > We cannot tell without the configure.log file. Note you may be able to use the shorten MS DOS directory names which do not have spaces for that directory? > > > > > I think the only thing configuring with MATLAB is 'sopen' and 'sclose'? > > Yes > > > Is it possible I could just remake these on my own for a WSL compatible version? > > The building of sopen and sclose is something you can do directly by adjusting the makefile by hand to link the correct files. > > > > Thanks, > > > > Lucas > > From karthikeyan.chockalingam at stfc.ac.uk Tue Sep 28 11:11:28 2021 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Tue, 28 Sep 2021 16:11:28 +0000 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> Message-ID: <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> Thanks for Barry for your response. I was just benchmarking the problem with various preconditioner on cpu and gpu. I understand, it is not possible to get mutually exclusive timing. However, can you tell if KSPSolve time includes both PCSetup and PCApply? And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and PCApply. Best, Karthik. From: Barry Smith Date: Tuesday, 28 September 2021 at 16:56 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" Cc: "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello, I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. Barry Thanks! Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: From liyiyang30 at gmail.com Tue Sep 28 12:44:33 2021 From: liyiyang30 at gmail.com (Yiyang Li) Date: Tue, 28 Sep 2021 10:44:33 -0700 Subject: [petsc-users] Turn off CUDA Devices information In-Reply-To: <5e127072-8cc3-b41a-5e9-9e498cde85fb@mcs.anl.gov> References: <5e127072-8cc3-b41a-5e9-9e498cde85fb@mcs.anl.gov> Message-ID: Yes, I do have superlu_dist built with petsc. The command I used for launching simulation is mpiexec --mca btl self,vader,tcp -np 4 python3 .../main.py ./input_ls -pc_type lu -pc_factor_mat_solver_type superlu_dist -pc_asm_type basic -cuda_device NONE On Mon, Sep 27, 2021 at 6:43 PM Satish Balay wrote: > Do you have petsc built with superlu_dist? > > Satish > > On Mon, 27 Sep 2021, Yiyang Li wrote: > > > Hello, > > > > I have CUDA aware MPI, and I have upgraded from PETSc 3.12 to PETSc > 3.15.4 > > and petsc4py 3.15.4. > > > > Now, when I call > > > > PETSc.KSP().solve(..., ...) > > > > The information of GPU is always printed to stdout by every MPI rank, > like > > > > CUDA version: v 11040 > > CUDA Devices: > > > > 0 : Quadro P4000 6 1 > > Global memory: 8105 mb > > Shared memory: 48 kb > > Constant memory: 64 kb > > Block registers: 65536 > > > > CUDA version: v 11040 > > CUDA Devices: > > > > 0 : Quadro P4000 6 1 > > Global memory: 8105 mb > > Shared memory: 48 kb > > Constant memory: 64 kb > > Block registers: 6553 > > > > ... > > > > I wonder if there is an option to turn that off? > > I have tried including > > > > -cuda_device NONE > > > > in command options, but that did not work. > > > > Best regards, > > Yiyang > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Sep 28 13:04:29 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 28 Sep 2021 13:04:29 -0500 (CDT) Subject: [petsc-users] Turn off CUDA Devices information In-Reply-To: References: <5e127072-8cc3-b41a-5e9-9e498cde85fb@mcs.anl.gov> Message-ID: <46ccd3ee-eb48-ca59-e7b7-25c924f9247b@mcs.anl.gov> This verbose message comes from superlu_dist (when built with cuda) I'm not sure how to disable it [without going into the code and commenting out the code that does this] balay at sb /home/balay/git-repo/github/superlu_dist (maint=) $ git grep 'CUDA version' SRC/cublas_utils.c: printf("CUDA version: v %d\n",CUDART_VERSION); Satish On Tue, 28 Sep 2021, Yiyang Li wrote: > Yes, I do have superlu_dist built with petsc. > The command I used for launching simulation is > > mpiexec --mca btl self,vader,tcp > -np 4 python3 .../main.py ./input_ls > -pc_type lu -pc_factor_mat_solver_type superlu_dist > -pc_asm_type basic -cuda_device NONE > > On Mon, Sep 27, 2021 at 6:43 PM Satish Balay wrote: > > > Do you have petsc built with superlu_dist? > > > > Satish > > > > On Mon, 27 Sep 2021, Yiyang Li wrote: > > > > > Hello, > > > > > > I have CUDA aware MPI, and I have upgraded from PETSc 3.12 to PETSc > > 3.15.4 > > > and petsc4py 3.15.4. > > > > > > Now, when I call > > > > > > PETSc.KSP().solve(..., ...) > > > > > > The information of GPU is always printed to stdout by every MPI rank, > > like > > > > > > CUDA version: v 11040 > > > CUDA Devices: > > > > > > 0 : Quadro P4000 6 1 > > > Global memory: 8105 mb > > > Shared memory: 48 kb > > > Constant memory: 64 kb > > > Block registers: 65536 > > > > > > CUDA version: v 11040 > > > CUDA Devices: > > > > > > 0 : Quadro P4000 6 1 > > > Global memory: 8105 mb > > > Shared memory: 48 kb > > > Constant memory: 64 kb > > > Block registers: 6553 > > > > > > ... > > > > > > I wonder if there is an option to turn that off? > > > I have tried including > > > > > > -cuda_device NONE > > > > > > in command options, but that did not work. > > > > > > Best regards, > > > Yiyang > > > > > > > > From liyiyang30 at gmail.com Tue Sep 28 13:15:29 2021 From: liyiyang30 at gmail.com (Yiyang Li) Date: Tue, 28 Sep 2021 11:15:29 -0700 Subject: [petsc-users] Turn off CUDA Devices information In-Reply-To: <46ccd3ee-eb48-ca59-e7b7-25c924f9247b@mcs.anl.gov> References: <5e127072-8cc3-b41a-5e9-9e498cde85fb@mcs.anl.gov> <46ccd3ee-eb48-ca59-e7b7-25c924f9247b@mcs.anl.gov> Message-ID: Alright, that explains why I can't find information on petsc website about how to turn that off. Thank you Satish for your hint, I will figure that out. Best, Yiyang On Tue, Sep 28, 2021 at 11:04 AM Satish Balay wrote: > This verbose message comes from superlu_dist (when built with cuda) > > I'm not sure how to disable it [without going into the code and commenting > out the code that does this] > > balay at sb /home/balay/git-repo/github/superlu_dist (maint=) > $ git grep 'CUDA version' > SRC/cublas_utils.c: printf("CUDA version: v %d\n",CUDART_VERSION); > > > Satish > > On Tue, 28 Sep 2021, Yiyang Li wrote: > > > Yes, I do have superlu_dist built with petsc. > > The command I used for launching simulation is > > > > mpiexec --mca btl self,vader,tcp > > -np 4 python3 .../main.py ./input_ls > > -pc_type lu -pc_factor_mat_solver_type superlu_dist > > -pc_asm_type basic -cuda_device NONE > > > > On Mon, Sep 27, 2021 at 6:43 PM Satish Balay wrote: > > > > > Do you have petsc built with superlu_dist? > > > > > > Satish > > > > > > On Mon, 27 Sep 2021, Yiyang Li wrote: > > > > > > > Hello, > > > > > > > > I have CUDA aware MPI, and I have upgraded from PETSc 3.12 to PETSc > > > 3.15.4 > > > > and petsc4py 3.15.4. > > > > > > > > Now, when I call > > > > > > > > PETSc.KSP().solve(..., ...) > > > > > > > > The information of GPU is always printed to stdout by every MPI rank, > > > like > > > > > > > > CUDA version: v 11040 > > > > CUDA Devices: > > > > > > > > 0 : Quadro P4000 6 1 > > > > Global memory: 8105 mb > > > > Shared memory: 48 kb > > > > Constant memory: 64 kb > > > > Block registers: 65536 > > > > > > > > CUDA version: v 11040 > > > > CUDA Devices: > > > > > > > > 0 : Quadro P4000 6 1 > > > > Global memory: 8105 mb > > > > Shared memory: 48 kb > > > > Constant memory: 64 kb > > > > Block registers: 6553 > > > > > > > > ... > > > > > > > > I wonder if there is an option to turn that off? > > > > I have tried including > > > > > > > > -cuda_device NONE > > > > > > > > in command options, but that did not work. > > > > > > > > Best regards, > > > > Yiyang > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Sep 28 13:18:56 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 28 Sep 2021 14:18:56 -0400 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> Message-ID: <9123E727-A05A-4614-B90B-852EE0088895@petsc.dev> > On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Thanks for Barry for your response. > > I was just benchmarking the problem with various preconditioner on cpu and gpu. I understand, it is not possible to get mutually exclusive timing. > However, can you tell if KSPSolve time includes both PCSetup and PCApply? And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and PCApply. If you do not call KSPSetUp() separately from KSPSolve() then its time is included with KSPSolve(). PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used. So yes the answer is not totally satisfying. The one thing I would recommend is to not call KSPSetUp() directly and then KSPSolve() will always include the total time of the solve plus all setup time. PCApply will contain all the time to apply the preconditioner but may also include some setup time. Barry > > Best, > Karthik. > > > > > From: Barry Smith > Date: Tuesday, 28 September 2021 at 16:56 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) > > > > > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello, > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. > > > For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. > > It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. > > Barry > > > > > Thanks! > Karthik. > > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Wed Sep 29 04:51:49 2021 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Wed, 29 Sep 2021 09:51:49 +0000 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: <9123E727-A05A-4614-B90B-852EE0088895@petsc.dev> References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> <9123E727-A05A-4614-B90B-852EE0088895@petsc.dev> Message-ID: <4588B16F-528E-4869-BF87-FF5716D0A1FE@stfc.ac.uk> That was helpful. I would like to provide some additional details of my run on cpus and gpus. Please find the following attachments: 1. graph.pdf a plot showing overall time and various petsc events. 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary I used the following petsc options for cpu mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi -ksp_monitor and for gpus mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor to run the following problem https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html From the above code, I see is there no individual function called KSPSetUp(), so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, kSPSetComputeOperators all are timed together as KSPSetUp. For this example, is KSPSetUp time and KSPSolve time mutually exclusive? In your response you said that ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used.? I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for this particular preconditioner (bjacobi) how can I tell how they are timed? I am hoping to time KSP solving and preconditioning mutually exclusively. Kind regards, Karthik. From: Barry Smith Date: Tuesday, 28 September 2021 at 19:19 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" Cc: "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thanks for Barry for your response. I was just benchmarking the problem with various preconditioner on cpu and gpu. I understand, it is not possible to get mutually exclusive timing. However, can you tell if KSPSolve time includes both PCSetup and PCApply? And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and PCApply. If you do not call KSPSetUp() separately from KSPSolve() then its time is included with KSPSolve(). PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used. So yes the answer is not totally satisfying. The one thing I would recommend is to not call KSPSetUp() directly and then KSPSolve() will always include the total time of the solve plus all setup time. PCApply will contain all the time to apply the preconditioner but may also include some setup time. Barry Best, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 16:56 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello, I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. Barry Thanks! Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ksp_ex45_N511_cpu_6.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graph.pdf Type: application/pdf Size: 97687 bytes Desc: graph.pdf URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ksp_ex45_N511_gpu_2.txt URL: From knepley at gmail.com Wed Sep 29 04:58:05 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 Sep 2021 05:58:05 -0400 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: <4588B16F-528E-4869-BF87-FF5716D0A1FE@stfc.ac.uk> References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> <9123E727-A05A-4614-B90B-852EE0088895@petsc.dev> <4588B16F-528E-4869-BF87-FF5716D0A1FE@stfc.ac.uk> Message-ID: On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI < karthikeyan.chockalingam at stfc.ac.uk> wrote: > That was helpful. I would like to provide some additional details of my > run on cpus and gpus. Please find the following attachments: > > > > 1. graph.pdf a plot showing overall time and various petsc events. > 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary > 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary > > > > I used the following petsc options for cpu > > > > mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi > -ksp_monitor > > > > and for gpus > > > > mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type > bjacobi -ksp_monitor > > > > to run the following problem > > > > https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html > > > > From the above code, I see is there no individual function called KSPSetUp(), > so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, > kSPSetComputeOperators all are timed together as KSPSetUp. For this > example, is KSPSetUp time and KSPSolve time mutually exclusive? > No, KSPSetUp() will be contained in KSPSolve() if it is called automatically. > In your response you said that > > > > ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it > depends on how much of the preconditioner construction can take place > early, so depends exactly on the preconditioner used.? > > > > I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for > this particular preconditioner (bjacobi) how can I tell how they are timed? > They are all inside KSPSolve(). If you have a preconditioned linear solve, the oreconditioning happens during the iteration. So an iteration would mostly consist of MatMult + PCApply, with some vector work. > I am hoping to time KSP solving and preconditioning mutually exclusively. > I am not sure that concept makes sense here. See above. Thanks, Matt > > > Kind regards, > > Karthik. > > > > > > *From: *Barry Smith > *Date: *Tuesday, 28 September 2021 at 19:19 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thanks for Barry for your response. > > > > I was just benchmarking the problem with various preconditioner on cpu and > gpu. I understand, it is not possible to get mutually exclusive timing. > > However, can you tell if KSPSolve time includes both PCSetup and PCApply? > And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp > and PCApply. > > > > If you do not call KSPSetUp() separately from KSPSolve() then its time > is included with KSPSolve(). > > > > PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends > on how much of the preconditioner construction can take place early, so > depends exactly on the preconditioner used. > > > > So yes the answer is not totally satisfying. The one thing I would > recommend is to not call KSPSetUp() directly and then KSPSolve() will > always include the total time of the solve plus all setup time. PCApply > will contain all the time to apply the preconditioner but may also include > some setup time. > > > > Barry > > > > > > Best, > > Karthik. > > > > > > > > > > *From: *Barry Smith > *Date: *Tuesday, 28 September 2021 at 16:56 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Hello, > > > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson > problem. I noticed from the output from using the flag -log_summary that > for various events their respective %T (percent time in this phase) do not > add up to 100 but rather exceeds 100. So, I gather there is some overlap > among these events. I am primarily looking at the events KSPSetUp, > KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive > %T or Time for these individual events? I have attached the log_summary > output file from my run for your reference. > > > > > > For nested solvers it is tricky to get the times to be mutually > exclusive because some parts of the building of the preconditioner is for > some preconditioners delayed until the solve has started. > > > > It looks like you are using the default preconditioner options which for > this example are taking more or less no time since so many iterations are > needed. It is best to use -pc_type mg to use geometric multigrid on this > problem. > > > > Barry > > > > > > > > > Thanks! > > Karthik. > > > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not use, > disclose, copy or distribute this email or any of its attachments and > should notify the sender immediately and delete this email from your > system. UK Research and Innovation (UKRI) has taken every reasonable > precaution to minimise risk of this email or any attachments containing > viruses or malware but the recipient should carry out its own virus and > malware checks before opening the attachments. UKRI does not accept any > liability for any losses or damages which the recipient may sustain due to > presence of any viruses. > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Wed Sep 29 05:24:46 2021 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Wed, 29 Sep 2021 10:24:46 +0000 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> <9123E727-A05A-4614-B90B-852EE0088895@petsc.dev> <4588B16F-528E-4869-BF87-FF5716D0A1FE@stfc.ac.uk> Message-ID: Thank you Mathew. Now, it is all making sense to me. From data file ksp_ex45_N511_gpu_2.txt KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). However, you said ?So an iteration would mostly consist of MatMult + PCApply, with some vector work? The MalMult event is 4 %. How does this event figure into the above equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? Best, Karthik. From: Matthew Knepley Date: Wednesday, 29 September 2021 at 10:58 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" Cc: Barry Smith , "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI > wrote: That was helpful. I would like to provide some additional details of my run on cpus and gpus. Please find the following attachments: 1. graph.pdf a plot showing overall time and various petsc events. 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary I used the following petsc options for cpu mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi -ksp_monitor and for gpus mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor to run the following problem https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html From the above code, I see is there no individual function called KSPSetUp(), so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, kSPSetComputeOperators all are timed together as KSPSetUp. For this example, is KSPSetUp time and KSPSolve time mutually exclusive? No, KSPSetUp() will be contained in KSPSolve() if it is called automatically. In your response you said that ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used.? I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for this particular preconditioner (bjacobi) how can I tell how they are timed? They are all inside KSPSolve(). If you have a preconditioned linear solve, the oreconditioning happens during the iteration. So an iteration would mostly consist of MatMult + PCApply, with some vector work. I am hoping to time KSP solving and preconditioning mutually exclusively. I am not sure that concept makes sense here. See above. Thanks, Matt Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 19:19 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thanks for Barry for your response. I was just benchmarking the problem with various preconditioner on cpu and gpu. I understand, it is not possible to get mutually exclusive timing. However, can you tell if KSPSolve time includes both PCSetup and PCApply? And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and PCApply. If you do not call KSPSetUp() separately from KSPSolve() then its time is included with KSPSolve(). PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used. So yes the answer is not totally satisfying. The one thing I would recommend is to not call KSPSetUp() directly and then KSPSolve() will always include the total time of the solve plus all setup time. PCApply will contain all the time to apply the preconditioner but may also include some setup time. Barry Best, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 16:56 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello, I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. Barry Thanks! Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 29 05:57:47 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 Sep 2021 06:57:47 -0400 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> <9123E727-A05A-4614-B90B-852EE0088895@petsc.dev> <4588B16F-528E-4869-BF87-FF5716D0A1FE@stfc.ac.uk> Message-ID: On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI < karthikeyan.chockalingam at stfc.ac.uk> wrote: > Thank you Mathew. Now, it is all making sense to me. > > > > From data file ksp_ex45_N511_gpu_2.txt > > > > KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). > > > > However, you said ?So an iteration would mostly consist of MatMult + > PCApply, with some vector work? > 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than one process and using Block-Jacobi . Half the time is spent in the solve (53%) KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which is all setup of the individual blocks, and this is all used by the numerical ILU factorization. PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 6.93e+03 0 0.00e+00 0 MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0 MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 3) The preconditioner application takes 37% of the time, which is all solving the factors and recorded in MatSolve(). Matrix multiplication takes 4%. PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100 4) The significant vector time is all in norms (11%) since they are really slow on the GPU. VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100 VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100 VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100 VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100 So the solve time is: 53% ~ 37% + 4% + 11% and the setup time is about 16%. I was wrong about the SetUp time being included, as it is outside the event: https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 It looks like the remainder of the time (23%) is spent preallocating the matrix. Thanks, Matt The MalMult event is 4 %. How does this event figure into the above > equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? > > > > Best, > > Karthik. > > > > *From: *Matthew Knepley > *Date: *Wednesday, 29 September 2021 at 10:58 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *Barry Smith , "petsc-users at mcs.anl.gov" < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > That was helpful. I would like to provide some additional details of my > run on cpus and gpus. Please find the following attachments: > > > > 1. graph.pdf a plot showing overall time and various petsc events. > 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary > 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary > > > > I used the following petsc options for cpu > > > > mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi > -ksp_monitor > > > > and for gpus > > > > mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type > bjacobi -ksp_monitor > > > > to run the following problem > > > > https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html > > > > From the above code, I see is there no individual function called KSPSetUp(), > so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, > kSPSetComputeOperators all are timed together as KSPSetUp. For this > example, is KSPSetUp time and KSPSolve time mutually exclusive? > > > > No, KSPSetUp() will be contained in KSPSolve() if it is called > automatically. > > > > In your response you said that > > > > ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it > depends on how much of the preconditioner construction can take place > early, so depends exactly on the preconditioner used.? > > > > I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for > this particular preconditioner (bjacobi) how can I tell how they are timed? > > > > They are all inside KSPSolve(). If you have a preconditioned linear solve, > the oreconditioning happens during the iteration. So an iteration would > mostly > > consist of MatMult + PCApply, with some vector work. > > > > I am hoping to time KSP solving and preconditioning mutually exclusively. > > > > I am not sure that concept makes sense here. See above. > > > > Thanks, > > > > Matt > > > > > > Kind regards, > > Karthik. > > > > > > *From: *Barry Smith > *Date: *Tuesday, 28 September 2021 at 19:19 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thanks for Barry for your response. > > > > I was just benchmarking the problem with various preconditioner on cpu and > gpu. I understand, it is not possible to get mutually exclusive timing. > > However, can you tell if KSPSolve time includes both PCSetup and PCApply? > And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp > and PCApply. > > > > If you do not call KSPSetUp() separately from KSPSolve() then its time > is included with KSPSolve(). > > > > PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends > on how much of the preconditioner construction can take place early, so > depends exactly on the preconditioner used. > > > > So yes the answer is not totally satisfying. The one thing I would > recommend is to not call KSPSetUp() directly and then KSPSolve() will > always include the total time of the solve plus all setup time. PCApply > will contain all the time to apply the preconditioner but may also include > some setup time. > > > > Barry > > > > > > Best, > > Karthik. > > > > > > > > > > *From: *Barry Smith > *Date: *Tuesday, 28 September 2021 at 16:56 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Hello, > > > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson > problem. I noticed from the output from using the flag -log_summary that > for various events their respective %T (percent time in this phase) do not > add up to 100 but rather exceeds 100. So, I gather there is some overlap > among these events. I am primarily looking at the events KSPSetUp, > KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive > %T or Time for these individual events? I have attached the log_summary > output file from my run for your reference. > > > > > > For nested solvers it is tricky to get the times to be mutually > exclusive because some parts of the building of the preconditioner is for > some preconditioners delayed until the solve has started. > > > > It looks like you are using the default preconditioner options which for > this example are taking more or less no time since so many iterations are > needed. It is best to use -pc_type mg to use geometric multigrid on this > problem. > > > > Barry > > > > > > > > Thanks! > > Karthik. > > > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not use, > disclose, copy or distribute this email or any of its attachments and > should notify the sender immediately and delete this email from your > system. UK Research and Innovation (UKRI) has taken every reasonable > precaution to minimise risk of this email or any attachments containing > viruses or malware but the recipient should carry out its own virus and > malware checks before opening the attachments. UKRI does not accept any > liability for any losses or damages which the recipient may sustain due to > presence of any viruses. > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Wed Sep 29 07:46:24 2021 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Wed, 29 Sep 2021 12:46:24 +0000 Subject: [petsc-users] Disconnected domains and Poisson equation Message-ID: Good morning, I want to solve the Poisson equation on a 3D domain with 2 non-connected sub-domains. I am using FGMRES+GAMG and I have no problem if the two sub-domains see a Dirichlet boundary condition each. On the same domain I would like to solve the Poisson equation imposing periodic boundary condition in one direction and homogenous Neumann boundary conditions in the other two directions. The two sub-domains are symmetric with respect to the separation between them and the operator discretization and the right hand side are symmetric as well. It would be nice to have the same solution in both the sub-domains. Setting the null space to the constant, the solver converges to a solution having the same gradients in both sub-domains but different values. Am I doing some wrong with the null space? I'm not setting a block matrix (one block for each sub-domain), should I? I tested the null space against the matrix using MatNullSpaceTest and the answer is true. Can I do something more to have a symmetric solution as outcome of the solver? Thank you in advance for any comments and hints. Best regards, Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Wed Sep 29 07:58:42 2021 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Wed, 29 Sep 2021 12:58:42 +0000 Subject: [petsc-users] FGMRES and BCGS Message-ID: Good Morning, I usually solve a non-symmetric discretization of the Poisson equation using GAMG+FGMRES. In the last days I tried to use BCGS in place of FGMRES, still using GAMG as preconditioner. No problem in finding the solution but I'm experiencing something I didn't expect. The test case is a 25 millions cells domain with Dirichlet and Neumann boundary conditions. Both the solvers are able to solve the problem with an increasing number of MPI processes, but: * FGMRES is about 25% faster than BCGS for all the processes number * Both solvers have the same scalability from 48 to 384 processes * Both solvers almost use the same amount of memory (FGMRES use a restart=30) Am I wrong expecting less memory consumption and more performance from BCGS with respect to FGMRES? Thank you in advance for any help. Best regards, Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 29 08:58:49 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 29 Sep 2021 09:58:49 -0400 Subject: [petsc-users] Disconnected domains and Poisson equation In-Reply-To: References: Message-ID: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> The problem actually has a two dimensional null space; constant on each domain but possibly different constants. I think you need to build the MatNullSpace by explicitly constructing two vectors, one with 0 on one domain and constant value on the other and one with 0 on the other domain and constant on the first. Separate note: why use FGMRES instead of just GMRES? If the problem is linear and the preconditioner is linear (no GMRES inside the smoother) then you can just use GMRES and it will save a little space/work and be conceptually clearer. Barry > On Sep 29, 2021, at 8:46 AM, Marco Cisternino wrote: > > Good morning, > I want to solve the Poisson equation on a 3D domain with 2 non-connected sub-domains. > I am using FGMRES+GAMG and I have no problem if the two sub-domains see a Dirichlet boundary condition each. > On the same domain I would like to solve the Poisson equation imposing periodic boundary condition in one direction and homogenous Neumann boundary conditions in the other two directions. The two sub-domains are symmetric with respect to the separation between them and the operator discretization and the right hand side are symmetric as well. It would be nice to have the same solution in both the sub-domains. > Setting the null space to the constant, the solver converges to a solution having the same gradients in both sub-domains but different values. > Am I doing some wrong with the null space? I?m not setting a block matrix (one block for each sub-domain), should I? > I tested the null space against the matrix using MatNullSpaceTest and the answer is true. Can I do something more to have a symmetric solution as outcome of the solver? > Thank you in advance for any comments and hints. > > Best regards, > > Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Wed Sep 29 09:18:53 2021 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Wed, 29 Sep 2021 14:18:53 +0000 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> <9123E727-A05A-4614-B90B-852EE0088895@petsc.dev> <4588B16F-528E-4869-BF87-FF5716D0A1FE@stfc.ac.uk> Message-ID: <8CD0BC94-1C5A-48B7-93B3-F5C467CAC1E0@stfc.ac.uk> Thank you! Just to summarize KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 100 % You didn?t happen to mention how MatCUSPARSSolAnl is accounted for? Am I right in accounting for it as above? MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and VecAYPX are mutually exclusive? Best, Karthik. From: Matthew Knepley Date: Wednesday, 29 September 2021 at 11:58 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" Cc: Barry Smith , "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Mathew. Now, it is all making sense to me. From data file ksp_ex45_N511_gpu_2.txt KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). However, you said ?So an iteration would mostly consist of MatMult + PCApply, with some vector work? 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than one process and using Block-Jacobi . Half the time is spent in the solve (53%) KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which is all setup of the individual blocks, and this is all used by the numerical ILU factorization. PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 6.93e+03 0 0.00e+00 0 MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0 MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 3) The preconditioner application takes 37% of the time, which is all solving the factors and recorded in MatSolve(). Matrix multiplication takes 4%. PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100 4) The significant vector time is all in norms (11%) since they are really slow on the GPU. VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100 VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100 VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100 VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100 So the solve time is: 53% ~ 37% + 4% + 11% and the setup time is about 16%. I was wrong about the SetUp time being included, as it is outside the event: https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 It looks like the remainder of the time (23%) is spent preallocating the matrix. Thanks, Matt The MalMult event is 4 %. How does this event figure into the above equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? Best, Karthik. From: Matthew Knepley > Date: Wednesday, 29 September 2021 at 10:58 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: Barry Smith >, "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI > wrote: That was helpful. I would like to provide some additional details of my run on cpus and gpus. Please find the following attachments: 1. graph.pdf a plot showing overall time and various petsc events. 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary I used the following petsc options for cpu mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi -ksp_monitor and for gpus mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor to run the following problem https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html From the above code, I see is there no individual function called KSPSetUp(), so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, kSPSetComputeOperators all are timed together as KSPSetUp. For this example, is KSPSetUp time and KSPSolve time mutually exclusive? No, KSPSetUp() will be contained in KSPSolve() if it is called automatically. In your response you said that ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used.? I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for this particular preconditioner (bjacobi) how can I tell how they are timed? They are all inside KSPSolve(). If you have a preconditioned linear solve, the oreconditioning happens during the iteration. So an iteration would mostly consist of MatMult + PCApply, with some vector work. I am hoping to time KSP solving and preconditioning mutually exclusively. I am not sure that concept makes sense here. See above. Thanks, Matt Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 19:19 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thanks for Barry for your response. I was just benchmarking the problem with various preconditioner on cpu and gpu. I understand, it is not possible to get mutually exclusive timing. However, can you tell if KSPSolve time includes both PCSetup and PCApply? And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and PCApply. If you do not call KSPSetUp() separately from KSPSolve() then its time is included with KSPSolve(). PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used. So yes the answer is not totally satisfying. The one thing I would recommend is to not call KSPSetUp() directly and then KSPSolve() will always include the total time of the solve plus all setup time. PCApply will contain all the time to apply the preconditioner but may also include some setup time. Barry Best, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 16:56 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello, I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. Barry Thanks! Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 29 10:29:05 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 Sep 2021 11:29:05 -0400 Subject: [petsc-users] %T (percent time in this phase) In-Reply-To: <8CD0BC94-1C5A-48B7-93B3-F5C467CAC1E0@stfc.ac.uk> References: <20E5B029-43D3-493C-873E-EB8F8CD92E08@stfc.ac.uk> <00A59A5B-7093-4FF1-9712-D0E6296E61D6@petsc.dev> <64B8653D-6E4C-4F6D-AA7F-C1A6A7693B75@stfc.ac.uk> <9123E727-A05A-4614-B90B-852EE0088895@petsc.dev> <4588B16F-528E-4869-BF87-FF5716D0A1FE@stfc.ac.uk> <8CD0BC94-1C5A-48B7-93B3-F5C467CAC1E0@stfc.ac.uk> Message-ID: On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI < karthikeyan.chockalingam at stfc.ac.uk> wrote: > Thank you! > > > > Just to summarize > > > > KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl > (9%) ~ 100 % > > > > You didn?t happen to mention how MatCUSPARSSolAnl is accounted for? Am I > right in accounting for it as above? > I am not sure.I thought it might be the GPU part of MatSolve(). I will have to look in the code. I am not as familiar with the GPU part. > MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > > > Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and > VecAYPX are mutually exclusive? > Yes. Thanks, Matt > Best, > > > > Karthik. > > > > *From: *Matthew Knepley > *Date: *Wednesday, 29 September 2021 at 11:58 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *Barry Smith , "petsc-users at mcs.anl.gov" < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > Thank you Mathew. Now, it is all making sense to me. > > > > From data file ksp_ex45_N511_gpu_2.txt > > > > KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). > > > > However, you said ?So an iteration would mostly consist of MatMult + > PCApply, with some vector work? > > > > 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than > one process and using Block-Jacobi . Half the time is spent in the solve > (53%) > > > > KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100 > > > > 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which > is all setup of the individual blocks, and this is all used by the > numerical ILU factorization. > > > > PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 > 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 > 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 > 6.93e+03 0 0.00e+00 0 > > MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0 > > MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > > > > 3) The preconditioner application takes 37% of the time, which is all > solving the factors and recorded in MatSolve(). Matrix multiplication takes > 4%. > > > > PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 > 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 > > MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 > > MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100 > > > > 4) The significant vector time is all in norms (11%) since they are really > slow on the GPU. > > > > VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100 > > VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100 > > VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100 > > VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100 > > > > So the solve time is: > > > > 53% ~ 37% + 4% + 11% > > > > and the setup time is about 16%. I was wrong about the SetUp time being > included, as it is outside the event: > > > > > https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 > > > > It looks like the remainder of the time (23%) is spent preallocating the > matrix. > > > > Thanks, > > > > Matt > > > > The MalMult event is 4 %. How does this event figure into the above > equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? > > > > Best, > > Karthik. > > > > *From: *Matthew Knepley > *Date: *Wednesday, 29 September 2021 at 10:58 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *Barry Smith , "petsc-users at mcs.anl.gov" < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > That was helpful. I would like to provide some additional details of my > run on cpus and gpus. Please find the following attachments: > > > > 1. graph.pdf a plot showing overall time and various petsc events. > 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary > 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary > > > > I used the following petsc options for cpu > > > > mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi > -ksp_monitor > > > > and for gpus > > > > mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type > bjacobi -ksp_monitor > > > > to run the following problem > > > > https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html > > > > From the above code, I see is there no individual function called KSPSetUp(), > so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, > kSPSetComputeOperators all are timed together as KSPSetUp. For this > example, is KSPSetUp time and KSPSolve time mutually exclusive? > > > > No, KSPSetUp() will be contained in KSPSolve() if it is called > automatically. > > > > In your response you said that > > > > ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it > depends on how much of the preconditioner construction can take place > early, so depends exactly on the preconditioner used.? > > > > I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for > this particular preconditioner (bjacobi) how can I tell how they are timed? > > > > They are all inside KSPSolve(). If you have a preconditioned linear solve, > the oreconditioning happens during the iteration. So an iteration would > mostly > > consist of MatMult + PCApply, with some vector work. > > > > I am hoping to time KSP solving and preconditioning mutually exclusively. > > > > I am not sure that concept makes sense here. See above. > > > > Thanks, > > > > Matt > > > > > > Kind regards, > > Karthik. > > > > > > *From: *Barry Smith > *Date: *Tuesday, 28 September 2021 at 19:19 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thanks for Barry for your response. > > > > I was just benchmarking the problem with various preconditioner on cpu and > gpu. I understand, it is not possible to get mutually exclusive timing. > > However, can you tell if KSPSolve time includes both PCSetup and PCApply? > And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp > and PCApply. > > > > If you do not call KSPSetUp() separately from KSPSolve() then its time > is included with KSPSolve(). > > > > PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends > on how much of the preconditioner construction can take place early, so > depends exactly on the preconditioner used. > > > > So yes the answer is not totally satisfying. The one thing I would > recommend is to not call KSPSetUp() directly and then KSPSolve() will > always include the total time of the solve plus all setup time. PCApply > will contain all the time to apply the preconditioner but may also include > some setup time. > > > > Barry > > > > > > Best, > > Karthik. > > > > > > > > > > *From: *Barry Smith > *Date: *Tuesday, 28 September 2021 at 16:56 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Hello, > > > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson > problem. I noticed from the output from using the flag -log_summary that > for various events their respective %T (percent time in this phase) do not > add up to 100 but rather exceeds 100. So, I gather there is some overlap > among these events. I am primarily looking at the events KSPSetUp, > KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive > %T or Time for these individual events? I have attached the log_summary > output file from my run for your reference. > > > > > > For nested solvers it is tricky to get the times to be mutually > exclusive because some parts of the building of the preconditioner is for > some preconditioners delayed until the solve has started. > > > > It looks like you are using the default preconditioner options which for > this example are taking more or less no time since so many iterations are > needed. It is best to use -pc_type mg to use geometric multigrid on this > problem. > > > > Barry > > > > > > > > Thanks! > > Karthik. > > > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not use, > disclose, copy or distribute this email or any of its attachments and > should notify the sender immediately and delete this email from your > system. UK Research and Innovation (UKRI) has taken every reasonable > precaution to minimise risk of this email or any attachments containing > viruses or malware but the recipient should carry out its own virus and > malware checks before opening the attachments. UKRI does not accept any > liability for any losses or damages which the recipient may sustain due to > presence of any viruses. > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Wed Sep 29 10:53:37 2021 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Wed, 29 Sep 2021 15:53:37 +0000 Subject: [petsc-users] Disconnected domains and Poisson equation In-Reply-To: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> References: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> Message-ID: Thank you Barry for the quick reply. About the null space: I already tried what you suggest, building 2 Vec (constants) with 0 and 1 chosen by sub-domain, normalizing them and setting the null space like this MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_FALSE,nconstants,constants,&nullspace); The solution is slightly different in values but it is still different in the two sub-domains. About the solver: I tried BCGS, GMRES and FGMRES. The linear system is a pressure system in a navier-stokes solver and only solving with FGMRES makes the CFD stable, with BCGS and GMRES the CFD solution diverges. Moreover, in the same case but with a single domain, CFD solution is stable using all the solvers, but FGMRES converges in much less iterations than the others. Marco Cisternino From: Barry Smith Sent: mercoled? 29 settembre 2021 15:59 To: Marco Cisternino Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Disconnected domains and Poisson equation The problem actually has a two dimensional null space; constant on each domain but possibly different constants. I think you need to build the MatNullSpace by explicitly constructing two vectors, one with 0 on one domain and constant value on the other and one with 0 on the other domain and constant on the first. Separate note: why use FGMRES instead of just GMRES? If the problem is linear and the preconditioner is linear (no GMRES inside the smoother) then you can just use GMRES and it will save a little space/work and be conceptually clearer. Barry On Sep 29, 2021, at 8:46 AM, Marco Cisternino > wrote: Good morning, I want to solve the Poisson equation on a 3D domain with 2 non-connected sub-domains. I am using FGMRES+GAMG and I have no problem if the two sub-domains see a Dirichlet boundary condition each. On the same domain I would like to solve the Poisson equation imposing periodic boundary condition in one direction and homogenous Neumann boundary conditions in the other two directions. The two sub-domains are symmetric with respect to the separation between them and the operator discretization and the right hand side are symmetric as well. It would be nice to have the same solution in both the sub-domains. Setting the null space to the constant, the solver converges to a solution having the same gradients in both sub-domains but different values. Am I doing some wrong with the null space? I?m not setting a block matrix (one block for each sub-domain), should I? I tested the null space against the matrix using MatNullSpaceTest and the answer is true. Can I do something more to have a symmetric solution as outcome of the solver? Thank you in advance for any comments and hints. Best regards, Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Wed Sep 29 10:59:00 2021 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Wed, 29 Sep 2021 15:59:00 +0000 Subject: [petsc-users] Disconnected domains and Poisson equation In-Reply-To: References: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> Message-ID: For sake of completeness, explicitly building the null space using a vector per sub-domain make s the CFD runs using BCGS and GMRES more stable, but still slower than FGMRES. I had divergence using BCGS and GMRES setting the null space with only one constant. Thanks Marco Cisternino From: Marco Cisternino Sent: mercoled? 29 settembre 2021 17:54 To: Barry Smith Cc: petsc-users at mcs.anl.gov Subject: RE: [petsc-users] Disconnected domains and Poisson equation Thank you Barry for the quick reply. About the null space: I already tried what you suggest, building 2 Vec (constants) with 0 and 1 chosen by sub-domain, normalizing them and setting the null space like this MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_FALSE,nconstants,constants,&nullspace); The solution is slightly different in values but it is still different in the two sub-domains. About the solver: I tried BCGS, GMRES and FGMRES. The linear system is a pressure system in a navier-stokes solver and only solving with FGMRES makes the CFD stable, with BCGS and GMRES the CFD solution diverges. Moreover, in the same case but with a single domain, CFD solution is stable using all the solvers, but FGMRES converges in much less iterations than the others. Marco Cisternino From: Barry Smith > Sent: mercoled? 29 settembre 2021 15:59 To: Marco Cisternino > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Disconnected domains and Poisson equation The problem actually has a two dimensional null space; constant on each domain but possibly different constants. I think you need to build the MatNullSpace by explicitly constructing two vectors, one with 0 on one domain and constant value on the other and one with 0 on the other domain and constant on the first. Separate note: why use FGMRES instead of just GMRES? If the problem is linear and the preconditioner is linear (no GMRES inside the smoother) then you can just use GMRES and it will save a little space/work and be conceptually clearer. Barry On Sep 29, 2021, at 8:46 AM, Marco Cisternino > wrote: Good morning, I want to solve the Poisson equation on a 3D domain with 2 non-connected sub-domains. I am using FGMRES+GAMG and I have no problem if the two sub-domains see a Dirichlet boundary condition each. On the same domain I would like to solve the Poisson equation imposing periodic boundary condition in one direction and homogenous Neumann boundary conditions in the other two directions. The two sub-domains are symmetric with respect to the separation between them and the operator discretization and the right hand side are symmetric as well. It would be nice to have the same solution in both the sub-domains. Setting the null space to the constant, the solver converges to a solution having the same gradients in both sub-domains but different values. Am I doing some wrong with the null space? I?m not setting a block matrix (one block for each sub-domain), should I? I tested the null space against the matrix using MatNullSpaceTest and the answer is true. Can I do something more to have a symmetric solution as outcome of the solver? Thank you in advance for any comments and hints. Best regards, Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Sep 29 11:33:40 2021 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 29 Sep 2021 12:33:40 -0400 Subject: [petsc-users] Disconnected domains and Poisson equation In-Reply-To: References: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> Message-ID: <10EA28EF-AD98-4F59-A78D-7DE3D4B585DE@petsc.dev> > On Sep 29, 2021, at 11:59 AM, Marco Cisternino wrote: > > For sake of completeness, explicitly building the null space using a vector per sub-domain make s the CFD runs using BCGS and GMRES more stable, but still slower than FGMRES. Something is strange. Please run with -ksp_view and send the output on the solver details. > I had divergence using BCGS and GMRES setting the null space with only one constant. > Thanks > > Marco Cisternino > > From: Marco Cisternino > Sent: mercoled? 29 settembre 2021 17:54 > To: Barry Smith > Cc: petsc-users at mcs.anl.gov > Subject: RE: [petsc-users] Disconnected domains and Poisson equation > > Thank you Barry for the quick reply. > About the null space: I already tried what you suggest, building 2 Vec (constants) with 0 and 1 chosen by sub-domain, normalizing them and setting the null space like this > MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_FALSE,nconstants,constants,&nullspace); > The solution is slightly different in values but it is still different in the two sub-domains. > About the solver: I tried BCGS, GMRES and FGMRES. The linear system is a pressure system in a navier-stokes solver and only solving with FGMRES makes the CFD stable, with BCGS and GMRES the CFD solution diverges. Moreover, in the same case but with a single domain, CFD solution is stable using all the solvers, but FGMRES converges in much less iterations than the others. > > Marco Cisternino > > From: Barry Smith > > Sent: mercoled? 29 settembre 2021 15:59 > To: Marco Cisternino > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Disconnected domains and Poisson equation > > > The problem actually has a two dimensional null space; constant on each domain but possibly different constants. I think you need to build the MatNullSpace by explicitly constructing two vectors, one with 0 on one domain and constant value on the other and one with 0 on the other domain and constant on the first. > > Separate note: why use FGMRES instead of just GMRES? If the problem is linear and the preconditioner is linear (no GMRES inside the smoother) then you can just use GMRES and it will save a little space/work and be conceptually clearer. > > Barry > > > On Sep 29, 2021, at 8:46 AM, Marco Cisternino > wrote: > > Good morning, > I want to solve the Poisson equation on a 3D domain with 2 non-connected sub-domains. > I am using FGMRES+GAMG and I have no problem if the two sub-domains see a Dirichlet boundary condition each. > On the same domain I would like to solve the Poisson equation imposing periodic boundary condition in one direction and homogenous Neumann boundary conditions in the other two directions. The two sub-domains are symmetric with respect to the separation between them and the operator discretization and the right hand side are symmetric as well. It would be nice to have the same solution in both the sub-domains. > Setting the null space to the constant, the solver converges to a solution having the same gradients in both sub-domains but different values. > Am I doing some wrong with the null space? I?m not setting a block matrix (one block for each sub-domain), should I? > I tested the null space against the matrix using MatNullSpaceTest and the answer is true. Can I do something more to have a symmetric solution as outcome of the solver? > Thank you in advance for any comments and hints. > > Best regards, > > Marco Cisternino > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 29 14:09:44 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 Sep 2021 15:09:44 -0400 Subject: [petsc-users] Disconnected domains and Poisson equation In-Reply-To: References: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> Message-ID: On Wed, Sep 29, 2021 at 11:53 AM Marco Cisternino < marco.cisternino at optimad.it> wrote: > Thank you Barry for the quick reply. > > About the null space: I already tried what you suggest, building 2 Vec > (constants) with 0 and 1 chosen by sub-domain, normalizing them and setting > the null space like this > > > MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_FALSE,nconstants,constants,&nullspace); > > The solution is slightly different in values but it is still different in > the two sub-domains. > > About the solver: I tried BCGS, GMRES and FGMRES. The linear system is a > pressure system in a navier-stokes solver and only solving with FGMRES > makes the CFD stable, with BCGS and GMRES the CFD solution diverges. > Moreover, in the same case but with a single domain, CFD solution is stable > using all the solvers, but FGMRES converges in much less iterations than > the others. > I think this means something is wrong with the implementation. FGMRES is the same as GMRES _if_ the preconditioner is a linear operator. The fact that they are different means that your preconditioner is nonlinear. Is this what you expect? Thanks, Matt > Marco Cisternino > > > > *From:* Barry Smith > *Sent:* mercoled? 29 settembre 2021 15:59 > *To:* Marco Cisternino > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Disconnected domains and Poisson equation > > > > > > The problem actually has a two dimensional null space; constant on each > domain but possibly different constants. I think you need to build the > MatNullSpace by explicitly constructing two vectors, one with 0 on one > domain and constant value on the other and one with 0 on the other domain > and constant on the first. > > > > Separate note: why use FGMRES instead of just GMRES? If the problem is > linear and the preconditioner is linear (no GMRES inside the smoother) then > you can just use GMRES and it will save a little space/work and be > conceptually clearer. > > > > Barry > > > > On Sep 29, 2021, at 8:46 AM, Marco Cisternino > wrote: > > > > Good morning, > > I want to solve the Poisson equation on a 3D domain with 2 non-connected > sub-domains. > > I am using FGMRES+GAMG and I have no problem if the two sub-domains see a > Dirichlet boundary condition each. > > On the same domain I would like to solve the Poisson equation imposing > periodic boundary condition in one direction and homogenous Neumann > boundary conditions in the other two directions. The two sub-domains are > symmetric with respect to the separation between them and the operator > discretization and the right hand side are symmetric as well. It would be > nice to have the same solution in both the sub-domains. > > Setting the null space to the constant, the solver converges to a solution > having the same gradients in both sub-domains but different values. > > Am I doing some wrong with the null space? I?m not setting a block matrix > (one block for each sub-domain), should I? > > I tested the null space against the matrix using MatNullSpaceTest and the > answer is true. Can I do something more to have a symmetric solution as > outcome of the solver? > > Thank you in advance for any comments and hints. > > > > Best regards, > > > > Marco Cisternino > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Chamberland at giref.ulaval.ca Wed Sep 29 16:18:46 2021 From: Eric.Chamberland at giref.ulaval.ca (Eric Chamberland) Date: Wed, 29 Sep 2021 17:18:46 -0400 Subject: [petsc-users] Is it possible to keep track of original elements # after a call to DMPlexDistribute ? In-Reply-To: <7236c736-6066-1ba3-55b1-60782d8e754f@giref.ulaval.ca> References: <7236c736-6066-1ba3-55b1-60782d8e754f@giref.ulaval.ca> Message-ID: Hi, I come back with _almost_ the original question: I would like to add an integer information (*our* original element number, not petsc one) on each element of the DMPlex I create with DMPlexBuildFromCellListParallel. I would like this interger to be distribruted by or the same way DMPlexDistribute distribute the mesh. Is it possible to do this? Thanks, Eric On 2021-07-14 1:18 p.m., Eric Chamberland wrote: > Hi, > > I want to use DMPlexDistribute from PETSc for computing overlapping > and play with the different partitioners supported. > > However, after calling DMPlexDistribute, I noticed the elements are > renumbered and then the original number is lost. > > What would be the best way to keep track of the element renumbering? > > a) Adding an optional parameter to let the user retrieve a vector or > "IS" giving the old number? > > b) Adding a DMLabel (seems a wrong good solution) > > c) Other idea? > > Of course, I don't want to loose performances with the need of this > "mapping"... > > Thanks, > > Eric > -- Eric Chamberland, ing., M. Ing Professionnel de recherche GIREF/Universit? Laval (418) 656-2131 poste 41 22 42 From rthirumalaisam1857 at sdsu.edu Wed Sep 29 16:37:12 2021 From: rthirumalaisam1857 at sdsu.edu (Ramakrishnan Thirumalaisamy) Date: Wed, 29 Sep 2021 14:37:12 -0700 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system Message-ID: Hi all, I am trying to solve the Helmholtz equation for temperature T: (C I + Div D grad) T = f in IBAMR, in which C is the spatially varying diagonal entries, and D is the spatially varying diffusion coefficient. I use a matrix-free solver with matrix-based PETSc preconditioner. For the matrix-free solver, I use gmres solver and for the matrix based preconditioner, I use Richardson ksp + Jacobi as a preconditioner. As the simulation progresses, the iterations start to increase. To understand the cause, I set D to be zero, which results in a diagonal system: C T = f. This should result in convergence within a single iteration, but I get convergence in 3 iterations. Residual norms for temperature_ solve. 0 KSP preconditioned resid norm 4.590811647875e-02 true resid norm 2.406067589273e+09 ||r(i)||/||b|| 4.455533946945e-05 1 KSP preconditioned resid norm 2.347767895880e-06 true resid norm 1.210763896685e+05 ||r(i)||/||b|| 2.242081505717e-09 2 KSP preconditioned resid norm 1.245406571896e-10 true resid norm 6.328828824310e+00 ||r(i)||/||b|| 1.171966730978e-13 Linear temperature_ solve converged due to CONVERGED_RTOL iterations 2 To verify that I am indeed solving a diagonal system I printed the PETSc matrix from the preconditioner and viewed it in Matlab. It indeed shows it to be a diagonal system. Attached is the plot of the spy command on the printed matrix. The matrix in binary form is also attached. My understanding is that because the C coefficient is varying in 4 orders of magnitude, i.e., Max(C)/Min(C) ~ 10^4, the matrix is poorly scaled. When I rescale my matrix by 1/C then the system converges in 1 iteration as expected. Is my understanding correct, and that scaling 1/C should be done even for a diagonal system? When D is non-zero, then scaling by 1/C seems to be very inconvenient as D is stored as side-centered data for the matrix free solver. In the case that I do not scale my equations by 1/C, is there some solver setting that improves the convergence rate? (With D as non-zero, I have also tried gmres as the ksp solver in the matrix-based preconditioner to get better performance, but it didn't matter much.) Thanks, Ramakrishnan Thirumalaisamy San Diego State University. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Temperature_fill.pdf Type: application/pdf Size: 56939 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matrix_temperature Type: application/octet-stream Size: 262160 bytes Desc: not available URL: From jed at jedbrown.org Wed Sep 29 17:28:28 2021 From: jed at jedbrown.org (Jed Brown) Date: Wed, 29 Sep 2021 16:28:28 -0600 Subject: [petsc-users] FGMRES and BCGS In-Reply-To: References: Message-ID: <87a6jvc8z7.fsf@jedbrown.org> It is not surprising. BCGS uses less memory for the Krylov vectors, but that might be a small fraction of the total memory used (considering your matrix and GAMG). FGMRES(30) needs 60 work vectors (2 per iteration). If you're using a linear (non-iterative) preconditioner, then you don't need a flexible method -- plain GMRES should be fine. FGMRES uses the unpreconditioned norm, which you can also get via -ksp_type gmres -ksp_norm_type unpreconditioned. This classic paper shows that for any class of nonsymmetric Krylov method, there are matrices in which that method outperforms every other method by at least sqrt(N). https://epubs.siam.org/doi/10.1137/0613049 Marco Cisternino writes: > Good Morning, > I usually solve a non-symmetric discretization of the Poisson equation using GAMG+FGMRES. > In the last days I tried to use BCGS in place of FGMRES, still using GAMG as preconditioner. > No problem in finding the solution but I'm experiencing something I didn't expect. > The test case is a 25 millions cells domain with Dirichlet and Neumann boundary conditions. > Both the solvers are able to solve the problem with an increasing number of MPI processes, but: > > * FGMRES is about 25% faster than BCGS for all the processes number > * Both solvers have the same scalability from 48 to 384 processes > * Both solvers almost use the same amount of memory (FGMRES use a restart=30) > Am I wrong expecting less memory consumption and more performance from BCGS with respect to FGMRES? > Thank you in advance for any help. > > Best regards, > Marco Cisternino From hzhang at mcs.anl.gov Wed Sep 29 17:39:19 2021 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 29 Sep 2021 22:39:19 +0000 Subject: [petsc-users] FGMRES and BCGS In-Reply-To: <87a6jvc8z7.fsf@jedbrown.org> References: <87a6jvc8z7.fsf@jedbrown.org> Message-ID: See https://doc.comsol.com/5.5/doc/com.comsol.help.comsol/comsol_ref_solver.27.123.html The Iterative Solvers - COMSOL Multiphysics The relaxation factor ? to some extent controls the stability and convergence properties of a numerical solver by shifting its eigenvalue spectrum. The optimal value for the relaxation factor can improve convergence significantly ? for example, for SOR when used as a solver. However, the optimal choice is typically a subtle task with arbitrary complexity. doc.comsol.com We use BCGS for constant memory while gmres without restart requires increased memory but has predictable convergence than BCGS. Hong ________________________________ From: petsc-users on behalf of Jed Brown Sent: Wednesday, September 29, 2021 5:28 PM To: Marco Cisternino ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] FGMRES and BCGS It is not surprising. BCGS uses less memory for the Krylov vectors, but that might be a small fraction of the total memory used (considering your matrix and GAMG). FGMRES(30) needs 60 work vectors (2 per iteration). If you're using a linear (non-iterative) preconditioner, then you don't need a flexible method -- plain GMRES should be fine. FGMRES uses the unpreconditioned norm, which you can also get via -ksp_type gmres -ksp_norm_type unpreconditioned. This classic paper shows that for any class of nonsymmetric Krylov method, there are matrices in which that method outperforms every other method by at least sqrt(N). https://epubs.siam.org/doi/10.1137/0613049 Marco Cisternino writes: > Good Morning, > I usually solve a non-symmetric discretization of the Poisson equation using GAMG+FGMRES. > In the last days I tried to use BCGS in place of FGMRES, still using GAMG as preconditioner. > No problem in finding the solution but I'm experiencing something I didn't expect. > The test case is a 25 millions cells domain with Dirichlet and Neumann boundary conditions. > Both the solvers are able to solve the problem with an increasing number of MPI processes, but: > > * FGMRES is about 25% faster than BCGS for all the processes number > * Both solvers have the same scalability from 48 to 384 processes > * Both solvers almost use the same amount of memory (FGMRES use a restart=30) > Am I wrong expecting less memory consumption and more performance from BCGS with respect to FGMRES? > Thank you in advance for any help. > > Best regards, > Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 29 17:58:46 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 Sep 2021 18:58:46 -0400 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system In-Reply-To: References: Message-ID: On Wed, Sep 29, 2021 at 6:03 PM Ramakrishnan Thirumalaisamy < rthirumalaisam1857 at sdsu.edu> wrote: > Hi all, > > I am trying to solve the Helmholtz equation for temperature T: > > (C I + Div D grad) T = f > > in IBAMR, in which C is the spatially varying diagonal entries, and D is > the spatially varying diffusion coefficient. I use a matrix-free solver > with matrix-based PETSc preconditioner. For the matrix-free solver, I use > gmres solver and for the matrix based preconditioner, I use Richardson ksp > + Jacobi as a preconditioner. As the simulation progresses, the iterations > start to increase. To understand the cause, I set D to be zero, which > results in a diagonal system: > > C T = f. > > This should result in convergence within a single iteration, but I get > convergence in 3 iterations. > > Residual norms for temperature_ solve. > > 0 KSP preconditioned resid norm 4.590811647875e-02 true resid norm > 2.406067589273e+09 ||r(i)||/||b|| 4.455533946945e-05 > > 1 KSP preconditioned resid norm 2.347767895880e-06 true resid norm > 1.210763896685e+05 ||r(i)||/||b|| 2.242081505717e-09 > > 2 KSP preconditioned resid norm 1.245406571896e-10 true resid norm > 6.328828824310e+00 ||r(i)||/||b|| 1.171966730978e-13 > > Linear temperature_ solve converged due to CONVERGED_RTOL iterations 2 > Several things look off here: 1) Your true residual norm is 2.4e9, but r_0/b is 4.4e-5. That seems to indicate that ||b|| is 1e14. Is this true? 2) Your preconditioned residual is 11 orders of magnitude less than the true residual. This usually indicates that the system is near singular. 3) The disparity above does not seem possible if C only has elements ~ 1e4. The preconditioner consistently has norm around 1e-11. 4) Using numbers that large can be a problem. You lose precision, so that you really only have 3-4 correct digits each time, as you see above. It appears to be doing iterative refinement with a very low precision solve. Thanks, Matt > To verify that I am indeed solving a diagonal system I printed the PETSc > matrix from the preconditioner and viewed it in Matlab. It indeed shows it > to be a diagonal system. Attached is the plot of the spy command on the > printed matrix. The matrix in binary form is also attached. > > My understanding is that because the C coefficient is varying in 4 orders > of magnitude, i.e., Max(C)/Min(C) ~ 10^4, the matrix is poorly scaled. When > I rescale my matrix by 1/C then the system converges in 1 iteration as > expected. Is my understanding correct, and that scaling 1/C should be done > even for a diagonal system? > > When D is non-zero, then scaling by 1/C seems to be very inconvenient as D > is stored as side-centered data for the matrix free solver. > > In the case that I do not scale my equations by 1/C, is there some solver > setting that improves the convergence rate? (With D as non-zero, I have > also tried gmres as the ksp solver in the matrix-based preconditioner to > get better performance, but it didn't matter much.) > > > Thanks, > Ramakrishnan Thirumalaisamy > San Diego State University. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Sep 29 18:39:08 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 Sep 2021 19:39:08 -0400 Subject: [petsc-users] Is it possible to keep track of original elements # after a call to DMPlexDistribute ? In-Reply-To: References: <7236c736-6066-1ba3-55b1-60782d8e754f@giref.ulaval.ca> Message-ID: On Wed, Sep 29, 2021 at 5:18 PM Eric Chamberland < Eric.Chamberland at giref.ulaval.ca> wrote: > Hi, > > I come back with _almost_ the original question: > > I would like to add an integer information (*our* original element > number, not petsc one) on each element of the DMPlex I create with > DMPlexBuildFromCellListParallel. > > I would like this interger to be distribruted by or the same way > DMPlexDistribute distribute the mesh. > > Is it possible to do this? > I think we already have support for what you want. If you call https://petsc.org/main/docs/manualpages/DM/DMSetUseNatural.html before DMPlexDistribute(), it will compute a PetscSF encoding the global to natural map. You can get it with https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexGetGlobalToNaturalSF.html and use it with https://petsc.org/main/docs/manualpages/DMPLEX/DMPlexGlobalToNaturalBegin.html Is this sufficient? Thanks, Matt > Thanks, > > Eric > > On 2021-07-14 1:18 p.m., Eric Chamberland wrote: > > Hi, > > > > I want to use DMPlexDistribute from PETSc for computing overlapping > > and play with the different partitioners supported. > > > > However, after calling DMPlexDistribute, I noticed the elements are > > renumbered and then the original number is lost. > > > > What would be the best way to keep track of the element renumbering? > > > > a) Adding an optional parameter to let the user retrieve a vector or > > "IS" giving the old number? > > > > b) Adding a DMLabel (seems a wrong good solution) > > > > c) Other idea? > > > > Of course, I don't want to loose performances with the need of this > > "mapping"... > > > > Thanks, > > > > Eric > > > -- > Eric Chamberland, ing., M. Ing > Professionnel de recherche > GIREF/Universit? Laval > (418) 656-2131 poste 41 22 42 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rthirumalaisam1857 at sdsu.edu Wed Sep 29 19:31:05 2021 From: rthirumalaisam1857 at sdsu.edu (Ramakrishnan Thirumalaisamy) Date: Wed, 29 Sep 2021 17:31:05 -0700 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system In-Reply-To: References: Message-ID: > > Several things look off here: > > 1) Your true residual norm is 2.4e9, but r_0/b is 4.4e-5. That seems to > indicate that ||b|| is 1e14. Is this true? > Yes. ||b|| is 1e14. We have verified that. > > 2) Your preconditioned residual is 11 orders of magnitude less than the > true residual. This usually indicates that the system is near singular. > The system is diagonal, as shown in Temperature_fill.pdf. So we don't think it is singular unless we are missing something very obvious. Diagonal elements range from 8.8e6 to 5.62896e+10, while off-diagonal terms are 0, as shown in the spy plot. > > 3) The disparity above does not seem possible if C only has elements ~ > 1e4. The preconditioner consistently has norm around 1e-11. > The value of C in the Helmholtz system is computed as : *C = rho*specific_heat/dt* in which dt = 5e-5, specific_heat ~10^3 and rho ranges from 0.4 to 2700. Hence, C ranges from 8.8e6 to 5.62896e10. Max_diagonal(C)/Min_diagonal(C) ~ 10^4. > > 4) Using numbers that large can be a problem. You lose precision, so that > you really only have 3-4 correct digits each time, as you see above. It > appears to be > doing iterative refinement with a very low precision solve. > Indeed the numbers are large because C = rho*specific_heat/dt. On Wed, Sep 29, 2021 at 3:58 PM Matthew Knepley wrote: > On Wed, Sep 29, 2021 at 6:03 PM Ramakrishnan Thirumalaisamy < > rthirumalaisam1857 at sdsu.edu> wrote: > >> Hi all, >> >> I am trying to solve the Helmholtz equation for temperature T: >> >> (C I + Div D grad) T = f >> >> in IBAMR, in which C is the spatially varying diagonal entries, and D is >> the spatially varying diffusion coefficient. I use a matrix-free solver >> with matrix-based PETSc preconditioner. For the matrix-free solver, I use >> gmres solver and for the matrix based preconditioner, I use Richardson ksp >> + Jacobi as a preconditioner. As the simulation progresses, the iterations >> start to increase. To understand the cause, I set D to be zero, which >> results in a diagonal system: >> >> C T = f. >> >> This should result in convergence within a single iteration, but I get >> convergence in 3 iterations. >> >> Residual norms for temperature_ solve. >> >> 0 KSP preconditioned resid norm 4.590811647875e-02 true resid norm >> 2.406067589273e+09 ||r(i)||/||b|| 4.455533946945e-05 >> >> 1 KSP preconditioned resid norm 2.347767895880e-06 true resid norm >> 1.210763896685e+05 ||r(i)||/||b|| 2.242081505717e-09 >> >> 2 KSP preconditioned resid norm 1.245406571896e-10 true resid norm >> 6.328828824310e+00 ||r(i)||/||b|| 1.171966730978e-13 >> >> Linear temperature_ solve converged due to CONVERGED_RTOL iterations 2 >> > > Several things look off here: > > 1) Your true residual norm is 2.4e9, but r_0/b is 4.4e-5. That seems to > indicate that ||b|| is 1e14. Is this true? > > 2) Your preconditioned residual is 11 orders of magnitude less than the > true residual. This usually indicates that the system is near singular. > > 3) The disparity above does not seem possible if C only has elements ~ > 1e4. The preconditioner consistently has norm around 1e-11. > > 4) Using numbers that large can be a problem. You lose precision, so that > you really only have 3-4 correct digits each time, as you see above. It > appears to be > doing iterative refinement with a very low precision solve. > > Thanks, > > Matt > > >> To verify that I am indeed solving a diagonal system I printed the PETSc >> matrix from the preconditioner and viewed it in Matlab. It indeed shows it >> to be a diagonal system. Attached is the plot of the spy command on the >> printed matrix. The matrix in binary form is also attached. >> >> My understanding is that because the C coefficient is varying in 4 orders >> of magnitude, i.e., Max(C)/Min(C) ~ 10^4, the matrix is poorly scaled. When >> I rescale my matrix by 1/C then the system converges in 1 iteration as >> expected. Is my understanding correct, and that scaling 1/C should be done >> even for a diagonal system? >> >> When D is non-zero, then scaling by 1/C seems to be very inconvenient as >> D is stored as side-centered data for the matrix free solver. >> >> In the case that I do not scale my equations by 1/C, is there some solver >> setting that improves the convergence rate? (With D as non-zero, I have >> also tried gmres as the ksp solver in the matrix-based preconditioner to >> get better performance, but it didn't matter much.) >> >> >> Thanks, >> Ramakrishnan Thirumalaisamy >> San Diego State University. >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marco.cisternino at optimad.it Thu Sep 30 03:16:01 2021 From: marco.cisternino at optimad.it (Marco Cisternino) Date: Thu, 30 Sep 2021 08:16:01 +0000 Subject: [petsc-users] Disconnected domains and Poisson equation In-Reply-To: <10EA28EF-AD98-4F59-A78D-7DE3D4B585DE@petsc.dev> References: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> <10EA28EF-AD98-4F59-A78D-7DE3D4B585DE@petsc.dev> Message-ID: Hello Barry. This is the output of ksp_view using fgmres and gamg. It has to be said that the solution of the linear system should be a zero values field. As you can see both unpreconditioned residual and r/b converge at this iteration of the CFD solver. During the time integration of the CFD, I can observe pressure linear solver residuals behaving in a different way: unpreconditioned residual stil converges but r/b stalls. After the output of ksp_view I add the output of ksp_monitor_true_residual for one of these iteration where r/b stalls. Thanks, KSP Object: 1 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=100, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: gamg type is MULTIPLICATIVE, levels=4 cycles=v Cycles per PCApply=1 Using externally compute Galerkin coarse grid matrices GAMG specific options Threshold for dropping small values in graph on each level = 0.02 0.02 Threshold scaling factor for each level not specified = 1. AGG specific options Symmetric graph true Number of levels to square graph 1 Number smoothing steps 0 Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: bjacobi number of blocks = 1 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu PC has not been set up so information may be incomplete out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=18, cols=18 total: nonzeros=104, allocated nonzeros=104 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=18, cols=18 total: nonzeros=104, allocated nonzeros=104 total number of mallocs used during MatSetValues calls =0 not using I-node routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 1 MPI processes type: chebyshev eigenvalue estimates used: min = 0., max = 0. eigenvalues estimate via gmres min 0., max 0. eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_1_) 1 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=67, cols=67 total: nonzeros=675, allocated nonzeros=675 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 1 MPI processes type: chebyshev eigenvalue estimates used: min = 0., max = 0. eigenvalues estimate via gmres min 0., max 0. eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_2_esteig_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_2_) 1 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=348, cols=348 total: nonzeros=3928, allocated nonzeros=3928 total number of mallocs used during MatSetValues calls =0 not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 1 MPI processes type: chebyshev eigenvalue estimates used: min = 0., max = 0. eigenvalues estimate via gmres min 0., max 0. eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_3_esteig_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=2, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_3_) 1 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=3584, cols=3584 total: nonzeros=23616, allocated nonzeros=23616 total number of mallocs used during MatSetValues calls =0 has attached null space not using I-node routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=3584, cols=3584 total: nonzeros=23616, allocated nonzeros=23616 total number of mallocs used during MatSetValues calls =0 has attached null space not using I-node routines Pressure system has reached convergence in 0 iterations with reason 3. 0 KSP unpreconditioned resid norm 4.798763170703e-16 true resid norm 4.798763170703e-16 ||r(i)||/||b|| 1.000000000000e+00 0 KSP Residual norm 4.798763170703e-16 % max 1.000000000000e+00 min 1.000000000000e+00 max/min 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.648749109132e-17 true resid norm 1.648749109132e-17 ||r(i)||/||b|| 3.435779284125e-02 1 KSP Residual norm 1.648749109132e-17 % max 9.561792537103e-01 min 9.561792537103e-01 max/min 1.000000000000e+00 2 KSP unpreconditioned resid norm 4.737880600040e-19 true resid norm 4.737880600040e-19 ||r(i)||/||b|| 9.873128619820e-04 2 KSP Residual norm 4.737880600040e-19 % max 9.828636644296e-01 min 9.293131521763e-01 max/min 1.057623753767e+00 3 KSP unpreconditioned resid norm 2.542212716830e-20 true resid norm 2.542212716830e-20 ||r(i)||/||b|| 5.297641551371e-05 3 KSP Residual norm 2.542212716830e-20 % max 9.933572357920e-01 min 9.158303248850e-01 max/min 1.084652046127e+00 4 KSP unpreconditioned resid norm 6.614510286263e-21 true resid norm 6.614510286269e-21 ||r(i)||/||b|| 1.378378146822e-05 4 KSP Residual norm 6.614510286263e-21 % max 9.950912550705e-01 min 6.296575800237e-01 max/min 1.580368896747e+00 5 KSP unpreconditioned resid norm 1.981505525281e-22 true resid norm 1.981505525272e-22 ||r(i)||/||b|| 4.129200493513e-07 5 KSP Residual norm 1.981505525281e-22 % max 9.984097962703e-01 min 5.316259535293e-01 max/min 1.878030577029e+00 Linear solve converged due to CONVERGED_RTOL iterations 5 Ksp_monitor_true_residual output for stalling r/b CFD iteration 0 KSP unpreconditioned resid norm 9.010260489109e-14 true resid norm 9.010260489109e-14 ||r(i)||/||b|| 2.021559024868e+00 0 KSP Residual norm 9.010260489109e-14 % max 1.000000000000e+00 min 1.000000000000e+00 max/min 1.000000000000e+00 1 KSP unpreconditioned resid norm 4.918108339808e-15 true resid norm 4.918171792537e-15 ||r(i)||/||b|| 1.103450292594e-01 1 KSP Residual norm 4.918108339808e-15 % max 9.566256813737e-01 min 9.566256813737e-01 max/min 1.000000000000e+00 2 KSP unpreconditioned resid norm 1.443599554690e-15 true resid norm 1.444867143493e-15 ||r(i)||/||b|| 3.241731154382e-02 2 KSP Residual norm 1.443599554690e-15 % max 9.614019380614e-01 min 7.360950481750e-01 max/min 1.306083963538e+00 3 KSP unpreconditioned resid norm 6.623206616803e-16 true resid norm 6.654132553541e-16 ||r(i)||/||b|| 1.492933720678e-02 3 KSP Residual norm 6.623206616803e-16 % max 9.764112945239e-01 min 4.911485418014e-01 max/min 1.988016274960e+00 4 KSP unpreconditioned resid norm 6.551896936698e-16 true resid norm 6.646157296305e-16 ||r(i)||/||b|| 1.491144376933e-02 4 KSP Residual norm 6.551896936698e-16 % max 9.883425885532e-01 min 1.461270778833e-01 max/min 6.763582786091e+00 5 KSP unpreconditioned resid norm 6.222297644887e-16 true resid norm 1.720560536914e-15 ||r(i)||/||b|| 3.860282047823e-02 5 KSP Residual norm 6.222297644887e-16 % max 1.000409371755e+00 min 4.989767363560e-03 max/min 2.004921870829e+02 6 KSP unpreconditioned resid norm 6.496945794974e-17 true resid norm 2.031914800253e-14 ||r(i)||/||b|| 4.558842341106e-01 6 KSP Residual norm 6.496945794974e-17 % max 1.004914985753e+00 min 1.459258738706e-03 max/min 6.886475709192e+02 7 KSP unpreconditioned resid norm 1.965237342540e-17 true resid norm 1.684522207337e-14 ||r(i)||/||b|| 3.779425772373e-01 7 KSP Residual norm 1.965237342540e-17 % max 1.005737762541e+00 min 1.452603803766e-03 max/min 6.923689446035e+02 8 KSP unpreconditioned resid norm 1.627718951285e-17 true resid norm 1.958642967520e-14 ||r(i)||/||b|| 4.394448276241e-01 8 KSP Residual norm 1.627718951285e-17 % max 1.006364278765e+00 min 1.452081813014e-03 max/min 6.930492963590e+02 9 KSP unpreconditioned resid norm 1.616577677764e-17 true resid norm 2.019110946644e-14 ||r(i)||/||b|| 4.530115373837e-01 9 KSP Residual norm 1.616577677764e-17 % max 1.006648747131e+00 min 1.452031376577e-03 max/min 6.932692801059e+02 10 KSP unpreconditioned resid norm 1.285788988203e-17 true resid norm 2.065082694477e-14 ||r(i)||/||b|| 4.633258453698e-01 10 KSP Residual norm 1.285788988203e-17 % max 1.007469033514e+00 min 1.433291867068e-03 max/min 7.029057072477e+02 11 KSP unpreconditioned resid norm 5.490854431580e-19 true resid norm 1.798071628891e-14 ||r(i)||/||b|| 4.034187394623e-01 11 KSP Residual norm 5.490854431580e-19 % max 1.008058905554e+00 min 1.369401685301e-03 max/min 7.361309076612e+02 12 KSP unpreconditioned resid norm 1.371754802104e-20 true resid norm 1.965688920064e-14 ||r(i)||/||b|| 4.410256708163e-01 12 KSP Residual norm 1.371754802104e-20 % max 1.008409402214e+00 min 1.369243011779e-03 max/min 7.364721919624e+02 Linear solve converged due to CONVERGED_RTOL iterations 12 Marco Cisternino From: Barry Smith Sent: mercoled? 29 settembre 2021 18:34 To: Marco Cisternino Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Disconnected domains and Poisson equation On Sep 29, 2021, at 11:59 AM, Marco Cisternino > wrote: For sake of completeness, explicitly building the null space using a vector per sub-domain make s the CFD runs using BCGS and GMRES more stable, but still slower than FGMRES. Something is strange. Please run with -ksp_view and send the output on the solver details. I had divergence using BCGS and GMRES setting the null space with only one constant. Thanks Marco Cisternino From: Marco Cisternino Sent: mercoled? 29 settembre 2021 17:54 To: Barry Smith > Cc: petsc-users at mcs.anl.gov Subject: RE: [petsc-users] Disconnected domains and Poisson equation Thank you Barry for the quick reply. About the null space: I already tried what you suggest, building 2 Vec (constants) with 0 and 1 chosen by sub-domain, normalizing them and setting the null space like this MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_FALSE,nconstants,constants,&nullspace); The solution is slightly different in values but it is still different in the two sub-domains. About the solver: I tried BCGS, GMRES and FGMRES. The linear system is a pressure system in a navier-stokes solver and only solving with FGMRES makes the CFD stable, with BCGS and GMRES the CFD solution diverges. Moreover, in the same case but with a single domain, CFD solution is stable using all the solvers, but FGMRES converges in much less iterations than the others. Marco Cisternino From: Barry Smith > Sent: mercoled? 29 settembre 2021 15:59 To: Marco Cisternino > Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Disconnected domains and Poisson equation The problem actually has a two dimensional null space; constant on each domain but possibly different constants. I think you need to build the MatNullSpace by explicitly constructing two vectors, one with 0 on one domain and constant value on the other and one with 0 on the other domain and constant on the first. Separate note: why use FGMRES instead of just GMRES? If the problem is linear and the preconditioner is linear (no GMRES inside the smoother) then you can just use GMRES and it will save a little space/work and be conceptually clearer. Barry On Sep 29, 2021, at 8:46 AM, Marco Cisternino > wrote: Good morning, I want to solve the Poisson equation on a 3D domain with 2 non-connected sub-domains. I am using FGMRES+GAMG and I have no problem if the two sub-domains see a Dirichlet boundary condition each. On the same domain I would like to solve the Poisson equation imposing periodic boundary condition in one direction and homogenous Neumann boundary conditions in the other two directions. The two sub-domains are symmetric with respect to the separation between them and the operator discretization and the right hand side are symmetric as well. It would be nice to have the same solution in both the sub-domains. Setting the null space to the constant, the solver converges to a solution having the same gradients in both sub-domains but different values. Am I doing some wrong with the null space? I?m not setting a block matrix (one block for each sub-domain), should I? I tested the null space against the matrix using MatNullSpaceTest and the answer is true. Can I do something more to have a symmetric solution as outcome of the solver? Thank you in advance for any comments and hints. Best regards, Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.wick.1980 at gmail.com Thu Sep 30 04:55:50 2021 From: michael.wick.1980 at gmail.com (Michael Wick) Date: Thu, 30 Sep 2021 17:55:50 +0800 Subject: [petsc-users] pass a member function to MatShellSetOperation Message-ID: Hi: I want to have the shell matrix-vector multiplication written as a class member function and pass it to the shell matrix via MatShellSetOperation. MatShellSetOperation(A, MATOP_MULT, (void (*)(void))(&Global_Assem::MyMatMult)); Perhaps I have a wrong understanding of function pointers, and I am constantly getting warnings that say I cannot convert a member function to a void type. The warning indeed makes sense to me, as the function pointer passed in the above manner is independent of an instance. Perhaps there are other ways of passing a member function that I don't know of. If you know how to address this, I would appreciate it a lot! Thanks, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From praveen at gmx.net Thu Sep 30 05:10:57 2021 From: praveen at gmx.net (Praveen C) Date: Thu, 30 Sep 2021 15:40:57 +0530 Subject: [petsc-users] pass a member function to MatShellSetOperation In-Reply-To: References: Message-ID: <63546E7A-36E3-440C-80EF-7B38A2B27071@gmx.net> I have used something like this in similar situation auto MatMult = [this](?args?) { this->MyMatMult(?args?); }; Then pass MatMult to petsc. this refers to the class Global_Assem and we are assuming you are inside this class when doing the above. best praveen > On 30-Sep-2021, at 3:25 PM, Michael Wick wrote: > > Hi: > > I want to have the shell matrix-vector multiplication written as a class member function and pass it to the shell matrix via MatShellSetOperation. > > MatShellSetOperation(A, MATOP_MULT, (void (*)(void))(&Global_Assem::MyMatMult)); > > Perhaps I have a wrong understanding of function pointers, and I am constantly getting warnings that say I cannot convert a member function to a void type. The warning indeed makes sense to me, as the function pointer passed in the above manner is independent of an instance. Perhaps there are other ways of passing a member function that I don't know of. If you know how to address this, I would appreciate it a lot! > > Thanks, > > Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 30 05:52:17 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Sep 2021 06:52:17 -0400 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system In-Reply-To: References: Message-ID: On Wed, Sep 29, 2021 at 8:31 PM Ramakrishnan Thirumalaisamy < rthirumalaisam1857 at sdsu.edu> wrote: > Several things look off here: >> >> 1) Your true residual norm is 2.4e9, but r_0/b is 4.4e-5. That seems to >> indicate that ||b|| is 1e14. Is this true? >> > Yes. ||b|| is 1e14. We have verified that. > >> >> 2) Your preconditioned residual is 11 orders of magnitude less than the >> true residual. This usually indicates that the system is near singular. >> > The system is diagonal, as shown in Temperature_fill.pdf. So we don't > think it is singular unless we are missing something very obvious. > Diagonal elements range from 8.8e6 to 5.62896e+10, while off-diagonal terms > are 0, as shown in the spy plot. > >> >> 3) The disparity above does not seem possible if C only has elements ~ >> 1e4. The preconditioner consistently has norm around 1e-11. >> > The value of C in the Helmholtz system is computed as : *C = > rho*specific_heat/dt* in which dt = 5e-5, specific_heat ~10^3 and rho > ranges from 0.4 to 2700. Hence, C ranges from 8.8e6 to 5.62896e10. > Max_diagonal(C)/Min_diagonal(C) ~ 10^4. > >> >> 4) Using numbers that large can be a problem. You lose precision, so that >> you really only have 3-4 correct digits each time, as you see above. It >> appears to be >> doing iterative refinement with a very low precision solve. >> > Indeed the numbers are large because C = rho*specific_heat/dt. > If you want to solve systems accurately, you should non-dimensionalize the system prior to discretization. This would mean that your C and b have elements in the [1, D] range, where D is the dynamic range of your problem, say 1e4, rather than these huge numbers you have now. Thanks, Matt > On Wed, Sep 29, 2021 at 3:58 PM Matthew Knepley wrote: > >> On Wed, Sep 29, 2021 at 6:03 PM Ramakrishnan Thirumalaisamy < >> rthirumalaisam1857 at sdsu.edu> wrote: >> >>> Hi all, >>> >>> I am trying to solve the Helmholtz equation for temperature T: >>> >>> (C I + Div D grad) T = f >>> >>> in IBAMR, in which C is the spatially varying diagonal entries, and D is >>> the spatially varying diffusion coefficient. I use a matrix-free solver >>> with matrix-based PETSc preconditioner. For the matrix-free solver, I use >>> gmres solver and for the matrix based preconditioner, I use Richardson ksp >>> + Jacobi as a preconditioner. As the simulation progresses, the iterations >>> start to increase. To understand the cause, I set D to be zero, which >>> results in a diagonal system: >>> >>> C T = f. >>> >>> This should result in convergence within a single iteration, but I get >>> convergence in 3 iterations. >>> >>> Residual norms for temperature_ solve. >>> >>> 0 KSP preconditioned resid norm 4.590811647875e-02 true resid norm >>> 2.406067589273e+09 ||r(i)||/||b|| 4.455533946945e-05 >>> >>> 1 KSP preconditioned resid norm 2.347767895880e-06 true resid norm >>> 1.210763896685e+05 ||r(i)||/||b|| 2.242081505717e-09 >>> >>> 2 KSP preconditioned resid norm 1.245406571896e-10 true resid norm >>> 6.328828824310e+00 ||r(i)||/||b|| 1.171966730978e-13 >>> >>> Linear temperature_ solve converged due to CONVERGED_RTOL iterations 2 >>> >> >> Several things look off here: >> >> 1) Your true residual norm is 2.4e9, but r_0/b is 4.4e-5. That seems to >> indicate that ||b|| is 1e14. Is this true? >> > >> 2) Your preconditioned residual is 11 orders of magnitude less than the >> true residual. This usually indicates that the system is near singular. >> >> 3) The disparity above does not seem possible if C only has elements ~ >> 1e4. The preconditioner consistently has norm around 1e-11. >> >> 4) Using numbers that large can be a problem. You lose precision, so that >> you really only have 3-4 correct digits each time, as you see above. It >> appears to be >> doing iterative refinement with a very low precision solve. >> >> Thanks, >> >> Matt >> >> >>> To verify that I am indeed solving a diagonal system I printed the PETSc >>> matrix from the preconditioner and viewed it in Matlab. It indeed shows it >>> to be a diagonal system. Attached is the plot of the spy command on the >>> printed matrix. The matrix in binary form is also attached. >>> >>> My understanding is that because the C coefficient is varying in 4 >>> orders of magnitude, i.e., Max(C)/Min(C) ~ 10^4, the matrix is poorly >>> scaled. When I rescale my matrix by 1/C then the system converges in 1 >>> iteration as expected. Is my understanding correct, and that scaling 1/C >>> should be done even for a diagonal system? >>> >>> When D is non-zero, then scaling by 1/C seems to be very inconvenient as >>> D is stored as side-centered data for the matrix free solver. >>> >>> In the case that I do not scale my equations by 1/C, is there some >>> solver setting that improves the convergence rate? (With D as non-zero, I >>> have also tried gmres as the ksp solver in the matrix-based preconditioner >>> to get better performance, but it didn't matter much.) >>> >>> >>> Thanks, >>> Ramakrishnan Thirumalaisamy >>> San Diego State University. >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From berend.vanwachem at ovgu.de Thu Sep 30 06:02:13 2021 From: berend.vanwachem at ovgu.de (Berend van Wachem) Date: Thu, 30 Sep 2021 13:02:13 +0200 Subject: [petsc-users] DMView and DMLoad In-Reply-To: <056E066F-D596-4254-A44A-60BFFD30FE82@erdw.ethz.ch> References: <56ce2135-9757-4292-e33b-c7eea8fb7b2e@ovgu.de> <056E066F-D596-4254-A44A-60BFFD30FE82@erdw.ethz.ch> Message-ID: <45d209e2-ecab-ead7-7229-a819736b91df@ovgu.de> Dear Vaclav, Lawrence, following your example, we have managed to save the DM with a wrapped Vector in h5 format (PETSC_VIEWER_HDF5_PETSC) with: DMPlexTopologyView(dm, viewer); DMClone(dm, &sdm); ... DMPlexSectionView(dm, viewer, sdm); DMGetLocalVector(sdm, &vec); ... DMPlexLocalVectorView(dm, viewer, sdm, vec); The problem comes with the loading of the "DM+Vec.h5" with: DMCreate(PETSC_COMM_WORLD, &dm); DMSetType(dm, DMPLEX); ... DMPlexTopologyLoad(dm, viewer, &sfO); ... PetscSFCompose(sfO, sfDist, &sf); ... DMClone(dm, &sdm); DMPlexSectionLoad(dm, viewer, sdm, sf, &globalDataSF, &localDataSF); DMGetLocalVector(sdm, &vec); ... DMPlexLocalVectorLoad(dm, H5Viewer, sdm, localDataSF, vec); The loaded DM is different to the one created with DMPlexCreateFromfile (for instance, no "coordinates" are recovered with the use of DMGetCoordinatesLocal). This conflicts with our code, which relies on features of the DM as delivered by the DMPlexCreateFromfile function. We have also noticed that the "DM+Vec.h5" can not be loaded directly with DMPlexCreateFromfile because it contains only the groups "topology" and "topologies" while the groups "geometry" and "labels" are missing (and probably other conflicts). Is this something which can be changed? We would need to reload a DM similar to the one created with DMPlexCreateFromfile. Best regards, Berend. On 9/22/21 8:59 PM, Hapla Vaclav wrote: > To avoid confusions here, Berend seems to be specifically demanding XDMF > (PETSC_VIEWER_HDF5_XDMF). The stuff we are now working on is parallel > checkpointing in our own HDF5 format?(PETSC_VIEWER_HDF5_PETSC), I will > make a series of MRs on this topic in the following days. > > For XDMF, we are specifically missing the ability to write/load DMLabels > properly. XDMF uses specific cell-local numbering for faces for > specification of face sets, and face-local numbering for specification > of edge sets, which is not great wrt DMPlex design. And ParaView doesn't > show any of these properly so it's hard to debug. Matt, we should talk > about this soon. > > Berend, for now, could you just load the mesh initially from XDMF and > then use our PETSC_VIEWER_HDF5_PETSC format for subsequent saving/loading? > > Thanks, > > Vaclav > >> On 17 Sep 2021, at 15:46, Lawrence Mitchell > > wrote: >> >> Hi Berend, >> >>> On 14 Sep 2021, at 12:23, Matthew Knepley >> > wrote: >>> >>> On Tue, Sep 14, 2021 at 5:15 AM Berend van Wachem >>> > wrote: >>> Dear PETSc-team, >>> >>> We are trying to save and load distributed DMPlex and its associated >>> physical fields (created with DMCreateGlobalVector) ?(Uvelocity, >>> VVelocity, ?...) in HDF5_XDMF format. To achieve this, we do the >>> following: >>> >>> 1) save in the same xdmf.h5 file: >>> DMView( DM ????????, H5_XDMF_Viewer ); >>> VecView( UVelocity, H5_XDMF_Viewer ); >>> >>> 2) load the dm: >>> DMPlexCreateFromfile(PETSC_COMM_WORLD, Filename, PETSC_TRUE, DM); >>> >>> 3) load the physical field: >>> VecLoad( UVelocity, H5_XDMF_Viewer ); >>> >>> There are no errors in the execution, but the loaded DM is distributed >>> differently to the original one, which results in the incorrect >>> placement of the values of the physical fields (UVelocity etc.) in the >>> domain. >>> >>> This approach is used to restart the simulation with the last saved DM. >>> Is there something we are missing, or there exists alternative routes to >>> this goal? Can we somehow get the IS of the redistribution, so we can >>> re-distribute the vector data as well? >>> >>> Many thanks, best regards, >>> >>> Hi Berend, >>> >>> We are in the midst of rewriting this. We want to support saving >>> multiple meshes, with fields attached to each, >>> and preserving the discretization (section) information, and allowing >>> us to load up on a different number of >>> processes. We plan to be done by October. Vaclav and I are doing this >>> in collaboration with Koki Sagiyama, >>> David Ham, and Lawrence Mitchell from the Firedrake team. >> >> The core load/save cycle functionality is now in PETSc main. So if >> you're using main rather than a release, you can get access to it now. >> This section of the manual shows an example of how to do >> thingshttps://petsc.org/main/docs/manual/dmplex/#saving-and-loading-data-with-hdf5 >> >> >> Let us know if things aren't clear! >> >> Thanks, >> >> Lawrence > From mfadams at lbl.gov Thu Sep 30 06:06:06 2021 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 30 Sep 2021 07:06:06 -0400 Subject: [petsc-users] Disconnected domains and Poisson equation In-Reply-To: References: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> <10EA28EF-AD98-4F59-A78D-7DE3D4B585DE@petsc.dev> Message-ID: * Do we understand: type: chebyshev eigenvalue estimates used: *min = 0., max = 0.* eigenvalues estimate via gmres *min 0., max 0.* * Is this Poisson solver unsymmetric? * Does this problem start off converging and then evolve and then start stagnating? The eigen estimates may need to be recalculated. Also Chebyshev is problematic for unsymmetric matrices. Hypre does better with unsymmetric matrices. Mark On Thu, Sep 30, 2021 at 4:16 AM Marco Cisternino < marco.cisternino at optimad.it> wrote: > Hello Barry. > > This is the output of ksp_view using fgmres and gamg. It has to be said > that the solution of the linear system should be a zero values field. As > you can see both unpreconditioned residual and r/b converge at this > iteration of the CFD solver. During the time integration of the CFD, I can > observe pressure linear solver residuals behaving in a different way: > unpreconditioned residual stil converges but r/b stalls. After the output > of ksp_view I add the output of ksp_monitor_true_residual for one of these > iteration where r/b stalls. > Thanks, > > > > KSP Object: 1 MPI processes > > type: fgmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=100, nonzero initial guess > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 1 MPI processes > > type: gamg > > type is MULTIPLICATIVE, levels=4 cycles=v > > Cycles per PCApply=1 > > Using externally compute Galerkin coarse grid matrices > > GAMG specific options > > Threshold for dropping small values in graph on each level = > 0.02 0.02 > > Threshold scaling factor for each level not specified = 1. > > AGG specific options > > Symmetric graph true > > Number of levels to square graph 1 > > Number smoothing steps 0 > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_coarse_) 1 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_) 1 MPI processes > > type: bjacobi > > number of blocks = 1 > > Local solve is same for all blocks, in the following KSP and PC > objects: > > KSP Object: (mg_coarse_sub_) 1 MPI processes > > type: preonly > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using DEFAULT norm type for convergence test > > PC Object: (mg_coarse_sub_) 1 MPI processes > > type: lu > > PC has not been set up so information may be incomplete > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > > matrix ordering: nd > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=18, cols=18 > > total: nonzeros=104, allocated nonzeros=104 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=18, cols=18 > > total: nonzeros=104, allocated nonzeros=104 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > Down solver (pre-smoother) on level 1 ------------------------------- > > KSP Object: (mg_levels_1_) 1 MPI processes > > type: chebyshev > > eigenvalue estimates used: min = 0., max = 0. > > eigenvalues estimate via gmres min 0., max 0. > > eigenvalues estimated using gmres with translations [0. 0.1; 0. > 1.1] > > KSP Object: (mg_levels_1_esteig_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10, initial guess is zero > > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > > left preconditioning > > using DEFAULT norm type for convergence test > > estimating eigenvalues using noisy right hand side > > maximum iterations=2, nonzero initial guess > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_levels_1_) 1 MPI processes > > type: sor > > type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=67, cols=67 > > total: nonzeros=675, allocated nonzeros=675 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > Up solver (post-smoother) same as down solver (pre-smoother) > > Down solver (pre-smoother) on level 2 ------------------------------- > > KSP Object: (mg_levels_2_) 1 MPI processes > > type: chebyshev > > eigenvalue estimates used: min = 0., max = 0. > > eigenvalues estimate via gmres min 0., max 0. > > eigenvalues estimated using gmres with translations [0. 0.1; 0. > 1.1] > > KSP Object: (mg_levels_2_esteig_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10, initial guess is zero > > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > > left preconditioning > > using DEFAULT norm type for convergence test > > estimating eigenvalues using noisy right hand side > > maximum iterations=2, nonzero initial guess > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_levels_2_) 1 MPI processes > > type: sor > > type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=348, cols=348 > > total: nonzeros=3928, allocated nonzeros=3928 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > Up solver (post-smoother) same as down solver (pre-smoother) > > Down solver (pre-smoother) on level 3 ------------------------------- > > KSP Object: (mg_levels_3_) 1 MPI processes > > type: chebyshev > > eigenvalue estimates used: min = 0., max = 0. > > eigenvalues estimate via gmres min 0., max 0. > > eigenvalues estimated using gmres with translations [0. 0.1; 0. > 1.1] > > KSP Object: (mg_levels_3_esteig_) 1 MPI processes > > type: gmres > > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > happy breakdown tolerance 1e-30 > > maximum iterations=10, initial guess is zero > > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > > left preconditioning > > using DEFAULT norm type for convergence test > > estimating eigenvalues using noisy right hand side > > maximum iterations=2, nonzero initial guess > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_levels_3_) 1 MPI processes > > type: sor > > type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1. > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=3584, cols=3584 > > total: nonzeros=23616, allocated nonzeros=23616 > > total number of mallocs used during MatSetValues calls =0 > > has attached null space > > not using I-node routines > > Up solver (post-smoother) same as down solver (pre-smoother) > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=3584, cols=3584 > > total: nonzeros=23616, allocated nonzeros=23616 > > total number of mallocs used during MatSetValues calls =0 > > has attached null space > > not using I-node routines > > Pressure system has reached convergence in 0 iterations with reason 3. > > 0 KSP unpreconditioned resid norm 4.798763170703e-16 true resid norm > 4.798763170703e-16 ||r(i)||/||b|| 1.000000000000e+00 > > 0 KSP Residual norm 4.798763170703e-16 % max 1.000000000000e+00 min > 1.000000000000e+00 max/min 1.000000000000e+00 > > 1 KSP unpreconditioned resid norm 1.648749109132e-17 true resid norm > 1.648749109132e-17 ||r(i)||/||b|| 3.435779284125e-02 > > 1 KSP Residual norm 1.648749109132e-17 % max 9.561792537103e-01 min > 9.561792537103e-01 max/min 1.000000000000e+00 > > 2 KSP unpreconditioned resid norm 4.737880600040e-19 true resid norm > 4.737880600040e-19 ||r(i)||/||b|| 9.873128619820e-04 > > 2 KSP Residual norm 4.737880600040e-19 % max 9.828636644296e-01 min > 9.293131521763e-01 max/min 1.057623753767e+00 > > 3 KSP unpreconditioned resid norm 2.542212716830e-20 true resid norm > 2.542212716830e-20 ||r(i)||/||b|| 5.297641551371e-05 > > 3 KSP Residual norm 2.542212716830e-20 % max 9.933572357920e-01 min > 9.158303248850e-01 max/min 1.084652046127e+00 > > 4 KSP unpreconditioned resid norm 6.614510286263e-21 true resid norm > 6.614510286269e-21 ||r(i)||/||b|| 1.378378146822e-05 > > 4 KSP Residual norm 6.614510286263e-21 % max 9.950912550705e-01 min > 6.296575800237e-01 max/min 1.580368896747e+00 > > 5 KSP unpreconditioned resid norm 1.981505525281e-22 true resid norm > 1.981505525272e-22 ||r(i)||/||b|| 4.129200493513e-07 > > 5 KSP Residual norm 1.981505525281e-22 % max 9.984097962703e-01 min > 5.316259535293e-01 max/min 1.878030577029e+00 > > Linear solve converged due to CONVERGED_RTOL iterations 5 > > > > Ksp_monitor_true_residual output for stalling r/b CFD iteration > 0 KSP unpreconditioned resid norm 9.010260489109e-14 true resid norm > 9.010260489109e-14 ||r(i)||/||b|| 2.021559024868e+00 > > 0 KSP Residual norm 9.010260489109e-14 % max 1.000000000000e+00 min > 1.000000000000e+00 max/min 1.000000000000e+00 > > 1 KSP unpreconditioned resid norm 4.918108339808e-15 true resid norm > 4.918171792537e-15 ||r(i)||/||b|| 1.103450292594e-01 > > 1 KSP Residual norm 4.918108339808e-15 % max 9.566256813737e-01 min > 9.566256813737e-01 max/min 1.000000000000e+00 > > 2 KSP unpreconditioned resid norm 1.443599554690e-15 true resid norm > 1.444867143493e-15 ||r(i)||/||b|| 3.241731154382e-02 > > 2 KSP Residual norm 1.443599554690e-15 % max 9.614019380614e-01 min > 7.360950481750e-01 max/min 1.306083963538e+00 > > 3 KSP unpreconditioned resid norm 6.623206616803e-16 true resid norm > 6.654132553541e-16 ||r(i)||/||b|| 1.492933720678e-02 > > 3 KSP Residual norm 6.623206616803e-16 % max 9.764112945239e-01 min > 4.911485418014e-01 max/min 1.988016274960e+00 > > 4 KSP unpreconditioned resid norm 6.551896936698e-16 true resid norm > 6.646157296305e-16 ||r(i)||/||b|| 1.491144376933e-02 > > 4 KSP Residual norm 6.551896936698e-16 % max 9.883425885532e-01 min > 1.461270778833e-01 max/min 6.763582786091e+00 > > 5 KSP unpreconditioned resid norm 6.222297644887e-16 true resid norm > 1.720560536914e-15 ||r(i)||/||b|| 3.860282047823e-02 > > 5 KSP Residual norm 6.222297644887e-16 % max 1.000409371755e+00 min > 4.989767363560e-03 max/min 2.004921870829e+02 > > 6 KSP unpreconditioned resid norm 6.496945794974e-17 true resid norm > 2.031914800253e-14 ||r(i)||/||b|| 4.558842341106e-01 > > 6 KSP Residual norm 6.496945794974e-17 % max 1.004914985753e+00 min > 1.459258738706e-03 max/min 6.886475709192e+02 > > 7 KSP unpreconditioned resid norm 1.965237342540e-17 true resid norm > 1.684522207337e-14 ||r(i)||/||b|| 3.779425772373e-01 > > 7 KSP Residual norm 1.965237342540e-17 % max 1.005737762541e+00 min > 1.452603803766e-03 max/min 6.923689446035e+02 > > 8 KSP unpreconditioned resid norm 1.627718951285e-17 true resid norm > 1.958642967520e-14 ||r(i)||/||b|| 4.394448276241e-01 > > 8 KSP Residual norm 1.627718951285e-17 % max 1.006364278765e+00 min > 1.452081813014e-03 max/min 6.930492963590e+02 > > 9 KSP unpreconditioned resid norm 1.616577677764e-17 true resid norm > 2.019110946644e-14 ||r(i)||/||b|| 4.530115373837e-01 > > 9 KSP Residual norm 1.616577677764e-17 % max 1.006648747131e+00 min > 1.452031376577e-03 max/min 6.932692801059e+02 > > 10 KSP unpreconditioned resid norm 1.285788988203e-17 true resid norm > 2.065082694477e-14 ||r(i)||/||b|| 4.633258453698e-01 > > 10 KSP Residual norm 1.285788988203e-17 % max 1.007469033514e+00 min > 1.433291867068e-03 max/min 7.029057072477e+02 > > 11 KSP unpreconditioned resid norm 5.490854431580e-19 true resid norm > 1.798071628891e-14 ||r(i)||/||b|| 4.034187394623e-01 > > 11 KSP Residual norm 5.490854431580e-19 % max 1.008058905554e+00 min > 1.369401685301e-03 max/min 7.361309076612e+02 > > 12 KSP unpreconditioned resid norm 1.371754802104e-20 true resid norm > 1.965688920064e-14 ||r(i)||/||b|| 4.410256708163e-01 > > 12 KSP Residual norm 1.371754802104e-20 % max 1.008409402214e+00 min > 1.369243011779e-03 max/min 7.364721919624e+02 > > Linear solve converged due to CONVERGED_RTOL iterations 12 > > > > > > > > Marco Cisternino > > > > *From:* Barry Smith > *Sent:* mercoled? 29 settembre 2021 18:34 > *To:* Marco Cisternino > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Disconnected domains and Poisson equation > > > > > > > > On Sep 29, 2021, at 11:59 AM, Marco Cisternino < > marco.cisternino at optimad.it> wrote: > > > > For sake of completeness, explicitly building the null space using a > vector per sub-domain make s the CFD runs using BCGS and GMRES more stable, > but still slower than FGMRES. > > > > Something is strange. Please run with -ksp_view and send the output on > the solver details. > > > > I had divergence using BCGS and GMRES setting the null space with only one > constant. > > Thanks > > > > Marco Cisternino > > > > *From:* Marco Cisternino > *Sent:* mercoled? 29 settembre 2021 17:54 > *To:* Barry Smith > *Cc:* petsc-users at mcs.anl.gov > *Subject:* RE: [petsc-users] Disconnected domains and Poisson equation > > > > Thank you Barry for the quick reply. > > About the null space: I already tried what you suggest, building 2 Vec > (constants) with 0 and 1 chosen by sub-domain, normalizing them and setting > the null space like this > > > MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_FALSE,nconstants,constants,&nullspace); > > The solution is slightly different in values but it is still different in > the two sub-domains. > > About the solver: I tried BCGS, GMRES and FGMRES. The linear system is a > pressure system in a navier-stokes solver and only solving with FGMRES > makes the CFD stable, with BCGS and GMRES the CFD solution diverges. > Moreover, in the same case but with a single domain, CFD solution is stable > using all the solvers, but FGMRES converges in much less iterations than > the others. > > > > Marco Cisternino > > > > *From:* Barry Smith > *Sent:* mercoled? 29 settembre 2021 15:59 > *To:* Marco Cisternino > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Disconnected domains and Poisson equation > > > > > > The problem actually has a two dimensional null space; constant on each > domain but possibly different constants. I think you need to build the > MatNullSpace by explicitly constructing two vectors, one with 0 on one > domain and constant value on the other and one with 0 on the other domain > and constant on the first. > > > > Separate note: why use FGMRES instead of just GMRES? If the problem is > linear and the preconditioner is linear (no GMRES inside the smoother) then > you can just use GMRES and it will save a little space/work and be > conceptually clearer. > > > > Barry > > > > On Sep 29, 2021, at 8:46 AM, Marco Cisternino > wrote: > > > > Good morning, > > I want to solve the Poisson equation on a 3D domain with 2 non-connected > sub-domains. > > I am using FGMRES+GAMG and I have no problem if the two sub-domains see a > Dirichlet boundary condition each. > > On the same domain I would like to solve the Poisson equation imposing > periodic boundary condition in one direction and homogenous Neumann > boundary conditions in the other two directions. The two sub-domains are > symmetric with respect to the separation between them and the operator > discretization and the right hand side are symmetric as well. It would be > nice to have the same solution in both the sub-domains. > > Setting the null space to the constant, the solver converges to a solution > having the same gradients in both sub-domains but different values. > > Am I doing some wrong with the null space? I?m not setting a block matrix > (one block for each sub-domain), should I? > > I tested the null space against the matrix using MatNullSpaceTest and the > answer is true. Can I do something more to have a symmetric solution as > outcome of the solver? > > Thank you in advance for any comments and hints. > > > > Best regards, > > > > Marco Cisternino > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 30 06:22:20 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Sep 2021 07:22:20 -0400 Subject: [petsc-users] pass a member function to MatShellSetOperation In-Reply-To: <63546E7A-36E3-440C-80EF-7B38A2B27071@gmx.net> References: <63546E7A-36E3-440C-80EF-7B38A2B27071@gmx.net> Message-ID: That is the new way to do it. The other way to do it is to have it be a static member function, so that it does not take "this". Thanks, Matt On Thu, Sep 30, 2021 at 6:11 AM Praveen C wrote: > I have used something like this in similar situation > > auto MatMult = [this](?args?) > { > this->MyMatMult(?args?); > }; > > > Then pass MatMult to petsc. > > *this* refers to the class Global_Assem and we are assuming you are > inside this class when doing the above. > > best > praveen > > On 30-Sep-2021, at 3:25 PM, Michael Wick > wrote: > > Hi: > > I want to have the shell matrix-vector multiplication written as a class > member function and pass it to the shell matrix via MatShellSetOperation. > > MatShellSetOperation(A, MATOP_MULT, (void > (*)(void))(&Global_Assem::MyMatMult)); > > Perhaps I have a wrong understanding of function pointers, and I am > constantly getting warnings that say I cannot convert a member function to > a void type. The warning indeed makes sense to me, as the function pointer > passed in the above manner is independent of an instance. Perhaps there are > other ways of passing a member function that I don't know of. If you know > how to address this, I would appreciate it a lot! > > Thanks, > > Mike > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Thu Sep 30 07:50:08 2021 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Thu, 30 Sep 2021 12:50:08 +0000 Subject: [petsc-users] (percent time in this phase) Message-ID: When comparing the MatSolve data for GPU MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 and CPU MatSolve 352 1.0 1.3553e+02 1.0 1.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00 35 34 0 0 0 35 34 0 0 0 4489 the time spent is almost the same for this preconditioner. Look like MatCUSPARSSolAnl is called only twice (since I am running on two cores) mpirun -n 2 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor So would it be fair to assume MatCUSPARSSolAnl is not accounted for in MatSolve and it is an exclusive event? KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 100 % Best, Karthik. From: Matthew Knepley Date: Wednesday, 29 September 2021 at 16:29 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" Cc: Barry Smith , "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you! Just to summarize KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 100 % You didn?t happen to mention how MatCUSPARSSolAnl is accounted for? Am I right in accounting for it as above? I am not sure.I thought it might be the GPU part of MatSolve(). I will have to look in the code. I am not as familiar with the GPU part. MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and VecAYPX are mutually exclusive? Yes. Thanks, Matt Best, Karthik. From: Matthew Knepley > Date: Wednesday, 29 September 2021 at 11:58 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: Barry Smith >, "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Mathew. Now, it is all making sense to me. From data file ksp_ex45_N511_gpu_2.txt KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). However, you said ?So an iteration would mostly consist of MatMult + PCApply, with some vector work? 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than one process and using Block-Jacobi . Half the time is spent in the solve (53%) KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which is all setup of the individual blocks, and this is all used by the numerical ILU factorization. PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 6.93e+03 0 0.00e+00 0 MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0 MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 3) The preconditioner application takes 37% of the time, which is all solving the factors and recorded in MatSolve(). Matrix multiplication takes 4%. PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100 4) The significant vector time is all in norms (11%) since they are really slow on the GPU. VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100 VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100 VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100 VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100 So the solve time is: 53% ~ 37% + 4% + 11% and the setup time is about 16%. I was wrong about the SetUp time being included, as it is outside the event: https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 It looks like the remainder of the time (23%) is spent preallocating the matrix. Thanks, Matt The MalMult event is 4 %. How does this event figure into the above equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? Best, Karthik. From: Matthew Knepley > Date: Wednesday, 29 September 2021 at 10:58 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: Barry Smith >, "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI > wrote: That was helpful. I would like to provide some additional details of my run on cpus and gpus. Please find the following attachments: 1. graph.pdf a plot showing overall time and various petsc events. 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary I used the following petsc options for cpu mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi -ksp_monitor and for gpus mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor to run the following problem https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html From the above code, I see is there no individual function called KSPSetUp(), so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, kSPSetComputeOperators all are timed together as KSPSetUp. For this example, is KSPSetUp time and KSPSolve time mutually exclusive? No, KSPSetUp() will be contained in KSPSolve() if it is called automatically. In your response you said that ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used.? I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for this particular preconditioner (bjacobi) how can I tell how they are timed? They are all inside KSPSolve(). If you have a preconditioned linear solve, the oreconditioning happens during the iteration. So an iteration would mostly consist of MatMult + PCApply, with some vector work. I am hoping to time KSP solving and preconditioning mutually exclusively. I am not sure that concept makes sense here. See above. Thanks, Matt Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 19:19 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thanks for Barry for your response. I was just benchmarking the problem with various preconditioner on cpu and gpu. I understand, it is not possible to get mutually exclusive timing. However, can you tell if KSPSolve time includes both PCSetup and PCApply? And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and PCApply. If you do not call KSPSetUp() separately from KSPSolve() then its time is included with KSPSolve(). PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used. So yes the answer is not totally satisfying. The one thing I would recommend is to not call KSPSetUp() directly and then KSPSolve() will always include the total time of the solve plus all setup time. PCApply will contain all the time to apply the preconditioner but may also include some setup time. Barry Best, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 16:56 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello, I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. Barry Thanks! Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 30 07:52:03 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Sep 2021 08:52:03 -0400 Subject: [petsc-users] (percent time in this phase) In-Reply-To: References: Message-ID: On Thu, Sep 30, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI < karthikeyan.chockalingam at stfc.ac.uk> wrote: > When comparing the MatSolve data for > > > > GPU > > > > MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 > 0.00e+00 100 > > MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > > > and CPU > > > > MatSolve 352 1.0 1.3553e+02 1.0 1.02e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 35 34 0 0 0 35 34 0 0 0 4489 > > > > the time spent is almost the same for this preconditioner. Look like > MatCUSPARSSolAnl is called only *twice* (since I am running on two cores) > > > > mpirun -n 2 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type > bjacobi -ksp_monitor > > > > So would it be fair to assume MatCUSPARSSolAnl is *not *accounted for in > MatSolve and it is an exclusive event? > Looks like that. Thanks Matt > KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) > ~ 100 % > > > > Best, > > Karthik. > > > > > > *From: *Matthew Knepley > *Date: *Wednesday, 29 September 2021 at 16:29 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *Barry Smith , "petsc-users at mcs.anl.gov" < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > Thank you! > > > > Just to summarize > > > > KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) > ~ 100 % > > > > You didn?t happen to mention how MatCUSPARSSolAnl is accounted for? Am I > right in accounting for it as above? > > > > I am not sure.I thought it might be the GPU part of MatSolve(). I will > have to look in the code. I am not as familiar with the GPU part. > > > > MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > > > Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and > VecAYPX are mutually exclusive? > > > > Yes. > > > > Thanks, > > > > Matt > > > > Best, > > > > Karthik. > > > > *From: *Matthew Knepley > *Date: *Wednesday, 29 September 2021 at 11:58 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *Barry Smith , "petsc-users at mcs.anl.gov" < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > Thank you Mathew. Now, it is all making sense to me. > > > > From data file ksp_ex45_N511_gpu_2.txt > > > > KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). > > > > However, you said ?So an iteration would mostly consist of MatMult + > PCApply, with some vector work? > > > > 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than > one process and using Block-Jacobi . Half the time is spent in the solve > (53%) > > > > KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100 > > > > 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which > is all setup of the individual blocks, and this is all used by the > numerical ILU factorization. > > > > PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 > 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 > 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 > 6.93e+03 0 0.00e+00 0 > > MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0 > > MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > > > > 3) The preconditioner application takes 37% of the time, which is all > solving the factors and recorded in MatSolve(). Matrix multiplication takes > 4%. > > > > PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 > 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 > > MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 > > MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100 > > > > 4) The significant vector time is all in norms (11%) since they are really > slow on the GPU. > > > > VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100 > > VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100 > > VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100 > > VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100 > > > > So the solve time is: > > > > 53% ~ 37% + 4% + 11% > > > > and the setup time is about 16%. I was wrong about the SetUp time being > included, as it is outside the event: > > > > > https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 > > > > It looks like the remainder of the time (23%) is spent preallocating the > matrix. > > > > Thanks, > > > > Matt > > > > The MalMult event is 4 %. How does this event figure into the above > equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? > > > > Best, > > Karthik. > > > > *From: *Matthew Knepley > *Date: *Wednesday, 29 September 2021 at 10:58 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *Barry Smith , "petsc-users at mcs.anl.gov" < > petsc-users at mcs.anl.gov> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > That was helpful. I would like to provide some additional details of my > run on cpus and gpus. Please find the following attachments: > > > > 1. graph.pdf a plot showing overall time and various petsc events. > 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary > 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary > > > > I used the following petsc options for cpu > > > > mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi > -ksp_monitor > > > > and for gpus > > > > mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type > bjacobi -ksp_monitor > > > > to run the following problem > > > > https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html > > > > From the above code, I see is there no individual function called KSPSetUp(), > so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, > kSPSetComputeOperators all are timed together as KSPSetUp. For this > example, is KSPSetUp time and KSPSolve time mutually exclusive? > > > > No, KSPSetUp() will be contained in KSPSolve() if it is called > automatically. > > > > In your response you said that > > > > ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it > depends on how much of the preconditioner construction can take place > early, so depends exactly on the preconditioner used.? > > > > I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for > this particular preconditioner (bjacobi) how can I tell how they are timed? > > > > They are all inside KSPSolve(). If you have a preconditioned linear solve, > the oreconditioning happens during the iteration. So an iteration would > mostly > > consist of MatMult + PCApply, with some vector work. > > > > I am hoping to time KSP solving and preconditioning mutually exclusively. > > > > I am not sure that concept makes sense here. See above. > > > > Thanks, > > > > Matt > > > > > > Kind regards, > > Karthik. > > > > > > *From: *Barry Smith > *Date: *Tuesday, 28 September 2021 at 19:19 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Thanks for Barry for your response. > > > > I was just benchmarking the problem with various preconditioner on cpu and > gpu. I understand, it is not possible to get mutually exclusive timing. > > However, can you tell if KSPSolve time includes both PCSetup and PCApply? > And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp > and PCApply. > > > > If you do not call KSPSetUp() separately from KSPSolve() then its time > is included with KSPSolve(). > > > > PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends > on how much of the preconditioner construction can take place early, so > depends exactly on the preconditioner used. > > > > So yes the answer is not totally satisfying. The one thing I would > recommend is to not call KSPSetUp() directly and then KSPSolve() will > always include the total time of the solve plus all setup time. PCApply > will contain all the time to apply the preconditioner but may also include > some setup time. > > > > Barry > > > > > > Best, > > Karthik. > > > > > > > > > > *From: *Barry Smith > *Date: *Tuesday, 28 September 2021 at 16:56 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > karthikeyan.chockalingam at stfc.ac.uk> > *Cc: *"petsc-users at mcs.anl.gov" > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI < > karthikeyan.chockalingam at stfc.ac.uk> wrote: > > > > Hello, > > > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson > problem. I noticed from the output from using the flag -log_summary that > for various events their respective %T (percent time in this phase) do not > add up to 100 but rather exceeds 100. So, I gather there is some overlap > among these events. I am primarily looking at the events KSPSetUp, > KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive > %T or Time for these individual events? I have attached the log_summary > output file from my run for your reference. > > > > > > For nested solvers it is tricky to get the times to be mutually > exclusive because some parts of the building of the preconditioner is for > some preconditioners delayed until the solve has started. > > > > It looks like you are using the default preconditioner options which for > this example are taking more or less no time since so many iterations are > needed. It is best to use -pc_type mg to use geometric multigrid on this > problem. > > > > Barry > > > > > > > > Thanks! > > Karthik. > > > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not use, > disclose, copy or distribute this email or any of its attachments and > should notify the sender immediately and delete this email from your > system. UK Research and Innovation (UKRI) has taken every reasonable > precaution to minimise risk of this email or any attachments containing > viruses or malware but the recipient should carry out its own virus and > malware checks before opening the attachments. UKRI does not accept any > liability for any losses or damages which the recipient may sustain due to > presence of any viruses. > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From s6hsbran at uni-bonn.de Thu Sep 30 08:39:00 2021 From: s6hsbran at uni-bonn.de (Hannes Phil Niklas Brandt) Date: Thu, 30 Sep 2021 15:39:00 +0200 Subject: [petsc-users] Possibilities to VecScatter to a sparse Vector-Format Message-ID: Hello, I intend to compute a parallel Matrix-Vector-Product (via MPI) and therefore would like to scatter the entries of the input MPI-Vec v to a local vector containing all entries relevant to the current process. To achieve this I tried defining a VecScatter, which scatters from v to a sequential Vec v_seq (each process has it's own version of v_seq). However, storing v_seq (which has one entry for each global row, thus containing a large amount of zero-entries) may demand too much storage space (in comparison to my data-sparse Matrix-Storage-Format). I am interested in possibilities to scatter v to a sparse Vec-type to avoid storing unnecessary large amounts of zero-entries. Is there a sparse Vector format in Petsc compatible to the VecScatter procedure or is there another efficient way to compute Matrix-Vector-Products without usinglarge amounts of storage space on each process? Best Regards Hannes Brandt -------------- next part -------------- An HTML attachment was scrubbed... URL: From karthikeyan.chockalingam at stfc.ac.uk Thu Sep 30 08:41:29 2021 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Thu, 30 Sep 2021 13:41:29 +0000 Subject: [petsc-users] (percent time in this phase) In-Reply-To: References: Message-ID: <6295C9A3-0EC7-4D6A-8F62-88EC8651D207@stfc.ac.uk> Based on your feedback from yesterday. I was trying to breakdown KSPSolve. Please find the attached bar plot. The numbers are not adding up at least for GPUs. Your feedback from yesterday were based on T%. I plotted the time spend on each event, hoping that the cumulative sum would add up to KSPSolve time. Kind regards, Karthik. From: Matthew Knepley Date: Thursday, 30 September 2021 at 13:52 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" Cc: Barry Smith , "petsc-users at mcs.anl.gov" Subject: Re: [petsc-users] (percent time in this phase) On Thu, Sep 30, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI > wrote: When comparing the MatSolve data for GPU MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 and CPU MatSolve 352 1.0 1.3553e+02 1.0 1.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00 35 34 0 0 0 35 34 0 0 0 4489 the time spent is almost the same for this preconditioner. Look like MatCUSPARSSolAnl is called only twice (since I am running on two cores) mpirun -n 2 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor So would it be fair to assume MatCUSPARSSolAnl is not accounted for in MatSolve and it is an exclusive event? Looks like that. Thanks Matt KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 100 % Best, Karthik. From: Matthew Knepley > Date: Wednesday, 29 September 2021 at 16:29 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: Barry Smith >, "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you! Just to summarize KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 100 % You didn?t happen to mention how MatCUSPARSSolAnl is accounted for? Am I right in accounting for it as above? I am not sure.I thought it might be the GPU part of MatSolve(). I will have to look in the code. I am not as familiar with the GPU part. MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and VecAYPX are mutually exclusive? Yes. Thanks, Matt Best, Karthik. From: Matthew Knepley > Date: Wednesday, 29 September 2021 at 11:58 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: Barry Smith >, "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI > wrote: Thank you Mathew. Now, it is all making sense to me. From data file ksp_ex45_N511_gpu_2.txt KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). However, you said ?So an iteration would mostly consist of MatMult + PCApply, with some vector work? 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than one process and using Block-Jacobi . Half the time is spent in the solve (53%) KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which is all setup of the individual blocks, and this is all used by the numerical ILU factorization. PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 6.93e+03 0 0.00e+00 0 MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0 MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 3) The preconditioner application takes 37% of the time, which is all solving the factors and recorded in MatSolve(). Matrix multiplication takes 4%. PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100 4) The significant vector time is all in norms (11%) since they are really slow on the GPU. VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100 VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100 VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100 VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100 So the solve time is: 53% ~ 37% + 4% + 11% and the setup time is about 16%. I was wrong about the SetUp time being included, as it is outside the event: https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 It looks like the remainder of the time (23%) is spent preallocating the matrix. Thanks, Matt The MalMult event is 4 %. How does this event figure into the above equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? Best, Karthik. From: Matthew Knepley > Date: Wednesday, 29 September 2021 at 10:58 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: Barry Smith >, "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI > wrote: That was helpful. I would like to provide some additional details of my run on cpus and gpus. Please find the following attachments: 1. graph.pdf a plot showing overall time and various petsc events. 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary I used the following petsc options for cpu mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi -ksp_monitor and for gpus mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor to run the following problem https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html From the above code, I see is there no individual function called KSPSetUp(), so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, kSPSetComputeOperators all are timed together as KSPSetUp. For this example, is KSPSetUp time and KSPSolve time mutually exclusive? No, KSPSetUp() will be contained in KSPSolve() if it is called automatically. In your response you said that ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used.? I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for this particular preconditioner (bjacobi) how can I tell how they are timed? They are all inside KSPSolve(). If you have a preconditioned linear solve, the oreconditioning happens during the iteration. So an iteration would mostly consist of MatMult + PCApply, with some vector work. I am hoping to time KSP solving and preconditioning mutually exclusively. I am not sure that concept makes sense here. See above. Thanks, Matt Kind regards, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 19:19 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: Thanks for Barry for your response. I was just benchmarking the problem with various preconditioner on cpu and gpu. I understand, it is not possible to get mutually exclusive timing. However, can you tell if KSPSolve time includes both PCSetup and PCApply? And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and PCApply. If you do not call KSPSetUp() separately from KSPSolve() then its time is included with KSPSolve(). PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used. So yes the answer is not totally satisfying. The one thing I would recommend is to not call KSPSetUp() directly and then KSPSolve() will always include the total time of the solve plus all setup time. PCApply will contain all the time to apply the preconditioner but may also include some setup time. Barry Best, Karthik. From: Barry Smith > Date: Tuesday, 28 September 2021 at 16:56 To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] %T (percent time in this phase) On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: Hello, I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. Barry Thanks! Karthik. This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: KSPSolve.pdf Type: application/pdf Size: 175716 bytes Desc: KSPSolve.pdf URL: From knepley at gmail.com Thu Sep 30 08:44:01 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Sep 2021 09:44:01 -0400 Subject: [petsc-users] Possibilities to VecScatter to a sparse Vector-Format In-Reply-To: References: Message-ID: On Thu, Sep 30, 2021 at 9:39 AM Hannes Phil Niklas Brandt < s6hsbran at uni-bonn.de> wrote: > Hello, > > > > I intend to compute a parallel Matrix-Vector-Product (via MPI) and > therefore would like to scatter the entries of the input MPI-Vec v to a > local vector containing all entries relevant to the current process. > > > > To achieve this I tried defining a VecScatter, which scatters from v to a > sequential Vec v_seq (each process has it's own version of v_seq). However, > storing v_seq (which has one entry for each global row, thus containing a > large amount of zero-entries) may demand too much storage space (in > comparison to my data-sparse Matrix-Storage-Format). > > > > I am interested in possibilities to scatter v to a sparse Vec-type to > avoid storing unnecessary large amounts of zero-entries. Is there a sparse > Vector format in Petsc compatible to the VecScatter procedure or is there > another efficient way to compute Matrix-Vector-Products without usinglarge > amounts of storage space on each process? > I think you misunderstand VecScatter. Parallel to sequential is one possibility, but also parallel-parallel, seq-parallel, etc. Second, you can give whatever indices you want into it. Thus you can index only a few places in a large array, or compact a sparse array into a contiguous one. I am not sure what other possibilities may exist. Thanks, Matt > Best Regards > > Hannes Brandt > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Sep 30 09:27:01 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 30 Sep 2021 09:27:01 -0500 Subject: [petsc-users] Possibilities to VecScatter to a sparse Vector-Format In-Reply-To: References: Message-ID: On Thu, Sep 30, 2021 at 8:39 AM Hannes Phil Niklas Brandt < s6hsbran at uni-bonn.de> wrote: > Hello, > > > > > > I intend to compute a parallel Matrix-Vector-Product (via MPI) and > therefore would like to scatter the entries of the input MPI-Vec v to a > local vector containing all entries relevant to the current process. > > > > To achieve this I tried defining a VecScatter, which scatters from v to a > sequential Vec v_seq (each process has it's own version of v_seq). However, > storing v_seq (which has one entry for each global row, thus containing a > large amount of zero-entries) may demand too much storage space (in > comparison to my data-sparse Matrix-Storage-Format). > What you said is exactly what petsc's MatMult does. It builds a VecScatter object (aij->Mvctx), and has a local vector (aij->lvec). It does not communicate or store unneeded remote entries. The code is at https://gitlab.com/petsc/petsc/-/blob/main/src/mat/impls/aij/mpi/mmaij.c#L9 > I am interested in possibilities to scatter v to a sparse Vec-type to > avoid storing unnecessary large amounts of zero-entries. Is there a sparse > Vector format in Petsc compatible to the VecScatter procedure or is there > another efficient way to compute Matrix-Vector-Products without usinglarge > amounts of storage space on each process? > > > > > > Best Regards > > Hannes Brandt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 30 09:32:30 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 30 Sep 2021 10:32:30 -0400 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system In-Reply-To: References: Message-ID: > On Sep 29, 2021, at 5:37 PM, Ramakrishnan Thirumalaisamy wrote: > > Hi all, > > I am trying to solve the Helmholtz equation for temperature T: > > (C I + Div D grad) T = f > > in IBAMR, in which C is the spatially varying diagonal entries, and D is the spatially varying diffusion coefficient. I use a matrix-free solver with matrix-based PETSc preconditioner. For the matrix-free solver, I use gmres solver and for the matrix based preconditioner, I use Richardson ksp + Jacobi as a preconditioner. As the simulation progresses, the iterations start to increase. To understand the cause, I set D to be zero, which results in a diagonal system: > > C T = f. > > This should result in convergence within a single iteration, but I get convergence in 3 iterations. > > Residual norms for temperature_ solve. > 0 KSP preconditioned resid norm 4.590811647875e-02 true resid norm 2.406067589273e+09 ||r(i)||/||b|| 4.455533946945e-05 > 1 KSP preconditioned resid norm 2.347767895880e-06 true resid norm 1.210763896685e+05 ||r(i)||/||b|| 2.242081505717e-09 > 2 KSP preconditioned resid norm 1.245406571896e-10 true resid norm 6.328828824310e+00 ||r(i)||/||b|| 1.171966730978e-13 > Linear temperature_ solve converged due to CONVERGED_RTOL iterations 2 > What is the result of -ksp_view on the solve? The way you describe your implementation it does not sound like standard PETSc practice. With PETSc using a matrix-free operation mA and a matrix from which KSP will build the preconditioner A one uses KSPSetOperator(ksp,mA,A); and then just selects the preconditioner with -pc_type xxx For example to use Jacobi preconditioning one uses -pc_type jacobi (note that this only uses the diagonal of A, the rest of A is never used). If you wish to precondition mA by fully solving with the matrix A one can use -ksp_monitor_true_residual -pc_type ksp -ksp_ksp_type yyy -ksp_pc_type xxx -ksp_ksp_monitor_true_residual with, for example, yyy of richardson and xxx of jacobi Barry > To verify that I am indeed solving a diagonal system I printed the PETSc matrix from the preconditioner and viewed it in Matlab. It indeed shows it to be a diagonal system. Attached is the plot of the spy command on the printed matrix. The matrix in binary form is also attached. > > My understanding is that because the C coefficient is varying in 4 orders of magnitude, i.e., Max(C)/Min(C) ~ 10^4, the matrix is poorly scaled. When I rescale my matrix by 1/C then the system converges in 1 iteration as expected. Is my understanding correct, and that scaling 1/C should be done even for a diagonal system? > > When D is non-zero, then scaling by 1/C seems to be very inconvenient as D is stored as side-centered data for the matrix free solver. > > In the case that I do not scale my equations by 1/C, is there some solver setting that improves the convergence rate? (With D as non-zero, I have also tried gmres as the ksp solver in the matrix-based preconditioner to get better performance, but it didn't matter much.) > > > Thanks, > Ramakrishnan Thirumalaisamy > San Diego State University. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 30 09:39:25 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 30 Sep 2021 10:39:25 -0400 Subject: [petsc-users] Disconnected domains and Poisson equation In-Reply-To: References: <448CEBF7-5B16-4E1C-8D1D-9CC067BD38BB@petsc.dev> <10EA28EF-AD98-4F59-A78D-7DE3D4B585DE@petsc.dev> Message-ID: <3A2F7686-44AA-47A5-B996-461E057F4EC3@petsc.dev> It looks like the initial solution (guess) is to round-off the solution to the linear system 9.010260489109e-14 0 KSP unpreconditioned resid norm 9.010260489109e-14 true resid norm 9.010260489109e-14 ||r(i)||/||b|| 2.021559024868e+00 0 KSP Residual norm 9.010260489109e-14 % max 1.000000000000e+00 min 1.000000000000e+00 max/min 1.000000000000e+00 1 KSP unpreconditioned resid norm 4.918108339808e-15 true resid norm 4.918171792537e-15 ||r(i)||/||b|| 1.103450292594e-01 1 KSP Residual norm 4.918108339808e-15 % max 9.566256813737e-01 min 9.566256813737e-01 max/min 1.000000000000e+00 2 KSP unpreconditioned resid norm 1.443599554690e-15 true resid norm 1.444867143493e-15 ||r(i)||/||b|| 3.241731154382e-02 2 KSP Residual norm 1.443599554690e-15 % max 9.614019380614e-01 min 7.360950481750e-01 max/min 1.306083963538e+00 Thus the Krylov solver will not be able to improve the solution, it then gets stuck trying to improve the solution but cannot because of round off. In other words the algorithm has converged (even at the initial solution (guess) and should stop immediately. You can use -ksp_atol 1.e-12 to get it to stop immediately without iterating if the initial residual is less than 1e-12. Barry > On Sep 30, 2021, at 4:16 AM, Marco Cisternino wrote: > > Hello Barry. > This is the output of ksp_view using fgmres and gamg. It has to be said that the solution of the linear system should be a zero values field. As you can see both unpreconditioned residual and r/b converge at this iteration of the CFD solver. During the time integration of the CFD, I can observe pressure linear solver residuals behaving in a different way: unpreconditioned residual stil converges but r/b stalls. After the output of ksp_view I add the output of ksp_monitor_true_residual for one of these iteration where r/b stalls. > Thanks, > > KSP Object: 1 MPI processes > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=100, nonzero initial guess > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: gamg > type is MULTIPLICATIVE, levels=4 cycles=v > Cycles per PCApply=1 > Using externally compute Galerkin coarse grid matrices > GAMG specific options > Threshold for dropping small values in graph on each level = 0.02 0.02 > Threshold scaling factor for each level not specified = 1. > AGG specific options > Symmetric graph true > Number of levels to square graph 1 > Number smoothing steps 0 > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 1 MPI processes > type: bjacobi > number of blocks = 1 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > PC has not been set up so information may be incomplete > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=18, cols=18 > total: nonzeros=104, allocated nonzeros=104 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=18, cols=18 > total: nonzeros=104, allocated nonzeros=104 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 1 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0., max = 0. > eigenvalues estimate via gmres min 0., max 0. > eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > estimating eigenvalues using noisy right hand side > maximum iterations=2, nonzero initial guess > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 1 MPI processes > type: sor > type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=67, cols=67 > total: nonzeros=675, allocated nonzeros=675 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 1 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0., max = 0. > eigenvalues estimate via gmres min 0., max 0. > eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_2_esteig_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > estimating eigenvalues using noisy right hand side > maximum iterations=2, nonzero initial guess > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 1 MPI processes > type: sor > type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=348, cols=348 > total: nonzeros=3928, allocated nonzeros=3928 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 3 ------------------------------- > KSP Object: (mg_levels_3_) 1 MPI processes > type: chebyshev > eigenvalue estimates used: min = 0., max = 0. > eigenvalues estimate via gmres min 0., max 0. > eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_3_esteig_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > estimating eigenvalues using noisy right hand side > maximum iterations=2, nonzero initial guess > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_levels_3_) 1 MPI processes > type: sor > type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=3584, cols=3584 > total: nonzeros=23616, allocated nonzeros=23616 > total number of mallocs used during MatSetValues calls =0 > has attached null space > not using I-node routines > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=3584, cols=3584 > total: nonzeros=23616, allocated nonzeros=23616 > total number of mallocs used during MatSetValues calls =0 > has attached null space > not using I-node routines > Pressure system has reached convergence in 0 iterations with reason 3. > 0 KSP unpreconditioned resid norm 4.798763170703e-16 true resid norm 4.798763170703e-16 ||r(i)||/||b|| 1.000000000000e+00 > 0 KSP Residual norm 4.798763170703e-16 % max 1.000000000000e+00 min 1.000000000000e+00 max/min 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 1.648749109132e-17 true resid norm 1.648749109132e-17 ||r(i)||/||b|| 3.435779284125e-02 > 1 KSP Residual norm 1.648749109132e-17 % max 9.561792537103e-01 min 9.561792537103e-01 max/min 1.000000000000e+00 > 2 KSP unpreconditioned resid norm 4.737880600040e-19 true resid norm 4.737880600040e-19 ||r(i)||/||b|| 9.873128619820e-04 > 2 KSP Residual norm 4.737880600040e-19 % max 9.828636644296e-01 min 9.293131521763e-01 max/min 1.057623753767e+00 > 3 KSP unpreconditioned resid norm 2.542212716830e-20 true resid norm 2.542212716830e-20 ||r(i)||/||b|| 5.297641551371e-05 > 3 KSP Residual norm 2.542212716830e-20 % max 9.933572357920e-01 min 9.158303248850e-01 max/min 1.084652046127e+00 > 4 KSP unpreconditioned resid norm 6.614510286263e-21 true resid norm 6.614510286269e-21 ||r(i)||/||b|| 1.378378146822e-05 > 4 KSP Residual norm 6.614510286263e-21 % max 9.950912550705e-01 min 6.296575800237e-01 max/min 1.580368896747e+00 > 5 KSP unpreconditioned resid norm 1.981505525281e-22 true resid norm 1.981505525272e-22 ||r(i)||/||b|| 4.129200493513e-07 > 5 KSP Residual norm 1.981505525281e-22 % max 9.984097962703e-01 min 5.316259535293e-01 max/min 1.878030577029e+00 > Linear solve converged due to CONVERGED_RTOL iterations 5 > > Ksp_monitor_true_residual output for stalling r/b CFD iteration > 0 KSP unpreconditioned resid norm 9.010260489109e-14 true resid norm 9.010260489109e-14 ||r(i)||/||b|| 2.021559024868e+00 > 0 KSP Residual norm 9.010260489109e-14 % max 1.000000000000e+00 min 1.000000000000e+00 max/min 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 4.918108339808e-15 true resid norm 4.918171792537e-15 ||r(i)||/||b|| 1.103450292594e-01 > 1 KSP Residual norm 4.918108339808e-15 % max 9.566256813737e-01 min 9.566256813737e-01 max/min 1.000000000000e+00 > 2 KSP unpreconditioned resid norm 1.443599554690e-15 true resid norm 1.444867143493e-15 ||r(i)||/||b|| 3.241731154382e-02 > 2 KSP Residual norm 1.443599554690e-15 % max 9.614019380614e-01 min 7.360950481750e-01 max/min 1.306083963538e+00 > 3 KSP unpreconditioned resid norm 6.623206616803e-16 true resid norm 6.654132553541e-16 ||r(i)||/||b|| 1.492933720678e-02 > 3 KSP Residual norm 6.623206616803e-16 % max 9.764112945239e-01 min 4.911485418014e-01 max/min 1.988016274960e+00 > 4 KSP unpreconditioned resid norm 6.551896936698e-16 true resid norm 6.646157296305e-16 ||r(i)||/||b|| 1.491144376933e-02 > 4 KSP Residual norm 6.551896936698e-16 % max 9.883425885532e-01 min 1.461270778833e-01 max/min 6.763582786091e+00 > 5 KSP unpreconditioned resid norm 6.222297644887e-16 true resid norm 1.720560536914e-15 ||r(i)||/||b|| 3.860282047823e-02 > 5 KSP Residual norm 6.222297644887e-16 % max 1.000409371755e+00 min 4.989767363560e-03 max/min 2.004921870829e+02 > 6 KSP unpreconditioned resid norm 6.496945794974e-17 true resid norm 2.031914800253e-14 ||r(i)||/||b|| 4.558842341106e-01 > 6 KSP Residual norm 6.496945794974e-17 % max 1.004914985753e+00 min 1.459258738706e-03 max/min 6.886475709192e+02 > 7 KSP unpreconditioned resid norm 1.965237342540e-17 true resid norm 1.684522207337e-14 ||r(i)||/||b|| 3.779425772373e-01 > 7 KSP Residual norm 1.965237342540e-17 % max 1.005737762541e+00 min 1.452603803766e-03 max/min 6.923689446035e+02 > 8 KSP unpreconditioned resid norm 1.627718951285e-17 true resid norm 1.958642967520e-14 ||r(i)||/||b|| 4.394448276241e-01 > 8 KSP Residual norm 1.627718951285e-17 % max 1.006364278765e+00 min 1.452081813014e-03 max/min 6.930492963590e+02 > 9 KSP unpreconditioned resid norm 1.616577677764e-17 true resid norm 2.019110946644e-14 ||r(i)||/||b|| 4.530115373837e-01 > 9 KSP Residual norm 1.616577677764e-17 % max 1.006648747131e+00 min 1.452031376577e-03 max/min 6.932692801059e+02 > 10 KSP unpreconditioned resid norm 1.285788988203e-17 true resid norm 2.065082694477e-14 ||r(i)||/||b|| 4.633258453698e-01 > 10 KSP Residual norm 1.285788988203e-17 % max 1.007469033514e+00 min 1.433291867068e-03 max/min 7.029057072477e+02 > 11 KSP unpreconditioned resid norm 5.490854431580e-19 true resid norm 1.798071628891e-14 ||r(i)||/||b|| 4.034187394623e-01 > 11 KSP Residual norm 5.490854431580e-19 % max 1.008058905554e+00 min 1.369401685301e-03 max/min 7.361309076612e+02 > 12 KSP unpreconditioned resid norm 1.371754802104e-20 true resid norm 1.965688920064e-14 ||r(i)||/||b|| 4.410256708163e-01 > 12 KSP Residual norm 1.371754802104e-20 % max 1.008409402214e+00 min 1.369243011779e-03 max/min 7.364721919624e+02 > Linear solve converged due to CONVERGED_RTOL iterations 12 > > > > Marco Cisternino > > From: Barry Smith > > Sent: mercoled? 29 settembre 2021 18:34 > To: Marco Cisternino > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Disconnected domains and Poisson equation > > > > > On Sep 29, 2021, at 11:59 AM, Marco Cisternino > wrote: > > For sake of completeness, explicitly building the null space using a vector per sub-domain make s the CFD runs using BCGS and GMRES more stable, but still slower than FGMRES. > > Something is strange. Please run with -ksp_view and send the output on the solver details. > > > I had divergence using BCGS and GMRES setting the null space with only one constant. > Thanks > > Marco Cisternino > > From: Marco Cisternino > Sent: mercoled? 29 settembre 2021 17:54 > To: Barry Smith > > Cc: petsc-users at mcs.anl.gov > Subject: RE: [petsc-users] Disconnected domains and Poisson equation > > Thank you Barry for the quick reply. > About the null space: I already tried what you suggest, building 2 Vec (constants) with 0 and 1 chosen by sub-domain, normalizing them and setting the null space like this > MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_FALSE,nconstants,constants,&nullspace); > The solution is slightly different in values but it is still different in the two sub-domains. > About the solver: I tried BCGS, GMRES and FGMRES. The linear system is a pressure system in a navier-stokes solver and only solving with FGMRES makes the CFD stable, with BCGS and GMRES the CFD solution diverges. Moreover, in the same case but with a single domain, CFD solution is stable using all the solvers, but FGMRES converges in much less iterations than the others. > > Marco Cisternino > > From: Barry Smith > > Sent: mercoled? 29 settembre 2021 15:59 > To: Marco Cisternino > > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Disconnected domains and Poisson equation > > > The problem actually has a two dimensional null space; constant on each domain but possibly different constants. I think you need to build the MatNullSpace by explicitly constructing two vectors, one with 0 on one domain and constant value on the other and one with 0 on the other domain and constant on the first. > > Separate note: why use FGMRES instead of just GMRES? If the problem is linear and the preconditioner is linear (no GMRES inside the smoother) then you can just use GMRES and it will save a little space/work and be conceptually clearer. > > Barry > > > On Sep 29, 2021, at 8:46 AM, Marco Cisternino > wrote: > > Good morning, > I want to solve the Poisson equation on a 3D domain with 2 non-connected sub-domains. > I am using FGMRES+GAMG and I have no problem if the two sub-domains see a Dirichlet boundary condition each. > On the same domain I would like to solve the Poisson equation imposing periodic boundary condition in one direction and homogenous Neumann boundary conditions in the other two directions. The two sub-domains are symmetric with respect to the separation between them and the operator discretization and the right hand side are symmetric as well. It would be nice to have the same solution in both the sub-domains. > Setting the null space to the constant, the solver converges to a solution having the same gradients in both sub-domains but different values. > Am I doing some wrong with the null space? I?m not setting a block matrix (one block for each sub-domain), should I? > I tested the null space against the matrix using MatNullSpaceTest and the answer is true. Can I do something more to have a symmetric solution as outcome of the solver? > Thank you in advance for any comments and hints. > > Best regards, > > Marco Cisternino -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 30 09:47:07 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 30 Sep 2021 10:47:07 -0400 Subject: [petsc-users] (percent time in this phase) In-Reply-To: <6295C9A3-0EC7-4D6A-8F62-88EC8651D207@stfc.ac.uk> References: <6295C9A3-0EC7-4D6A-8F62-88EC8651D207@stfc.ac.uk> Message-ID: <3B13EDB4-A22B-421B-9B5C-F95BA9CF9705@petsc.dev> The MatSolve is no better on the GPUs then on the CPU; while other parts of the computation seem to speed up nicely. What is the result of -ksp_view ? Are you using ILU(0) as the preconditioner, this will not solve well on the GPU, its solve is essentially sequential. You won't want to use ILU(0) in this way on GPUs. Barry > On Sep 30, 2021, at 9:41 AM, Karthikeyan Chockalingam - STFC UKRI wrote: > > Based on your feedback from yesterday. I was trying to breakdown KSPSolve. > Please find the attached bar plot. The numbers are not adding up at least for GPUs. > Your feedback from yesterday were based on T%. > I plotted the time spend on each event, hoping that the cumulative sum would add up to KSPSolve time. > > Kind regards, > Karthik. > > From: Matthew Knepley > Date: Thursday, 30 September 2021 at 13:52 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > Cc: Barry Smith , "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] (percent time in this phase) > > On Thu, Sep 30, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI > wrote: > When comparing the MatSolve data for > > GPU > > MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 > MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > > and CPU > > MatSolve 352 1.0 1.3553e+02 1.0 1.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00 35 34 0 0 0 35 34 0 0 0 4489 > > the time spent is almost the same for this preconditioner. Look like MatCUSPARSSolAnl is called only twice (since I am running on two cores) > > mpirun -n 2 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor > > So would it be fair to assume MatCUSPARSSolAnl is not accounted for in MatSolve and it is an exclusive event? > > Looks like that. > > Thanks > > Matt > > KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 100 % > > Best, > Karthik. > > > From: Matthew Knepley > > Date: Wednesday, 29 September 2021 at 16:29 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > > Cc: Barry Smith >, "petsc-users at mcs.anl.gov " > > Subject: Re: [petsc-users] %T (percent time in this phase) > > On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI > wrote: > Thank you! > > Just to summarize > > KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 100 % > > You didn?t happen to mention how MatCUSPARSSolAnl is accounted for? Am I right in accounting for it as above? > > I am not sure.I thought it might be the GPU part of MatSolve(). I will have to look in the code. I am not as familiar with the GPU part. > > MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > > Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and VecAYPX are mutually exclusive? > > Yes. > > Thanks, > > Matt > > Best, > > Karthik. > > From: Matthew Knepley > > Date: Wednesday, 29 September 2021 at 11:58 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > > Cc: Barry Smith >, "petsc-users at mcs.anl.gov " > > Subject: Re: [petsc-users] %T (percent time in this phase) > > On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI > wrote: > Thank you Mathew. Now, it is all making sense to me. > > From data file ksp_ex45_N511_gpu_2.txt > > KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). > > However, you said ?So an iteration would mostly consist of MatMult + PCApply, with some vector work? > > 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than one process and using Block-Jacobi . Half the time is spent in the solve (53%) > > KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 > KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100 > > > 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which is all setup of the individual blocks, and this is all used by the numerical ILU factorization. > > PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 6.93e+03 0 0.00e+00 0 > MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0 > MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 > > 3) The preconditioner application takes 37% of the time, which is all solving the factors and recorded in MatSolve(). Matrix multiplication takes 4%. > > PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 > MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 > MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100 > > 4) The significant vector time is all in norms (11%) since they are really slow on the GPU. > > > VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100 > VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100 > VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100 > VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100 > > So the solve time is: > > 53% ~ 37% + 4% + 11% > > and the setup time is about 16%. I was wrong about the SetUp time being included, as it is outside the event: > > https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 > > It looks like the remainder of the time (23%) is spent preallocating the matrix. > > Thanks, > > Matt > > The MalMult event is 4 %. How does this event figure into the above equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? > > Best, > Karthik. > > From: Matthew Knepley > > Date: Wednesday, 29 September 2021 at 10:58 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > > Cc: Barry Smith >, "petsc-users at mcs.anl.gov " > > Subject: Re: [petsc-users] %T (percent time in this phase) > > On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI > wrote: > That was helpful. I would like to provide some additional details of my run on cpus and gpus. Please find the following attachments: > > graph.pdf a plot showing overall time and various petsc events. > ksp_ex45_N511_cpu_6.txt data file of the log_summary > ksp_ex45_N511_gpu_2.txt data file of the log_summary > > I used the following petsc options for cpu > > mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi -ksp_monitor > > and for gpus > > mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi -ksp_monitor > > to run the following problem > > https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html > > From the above code, I see is there no individual function called KSPSetUp(), so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, kSPSetComputeOperators all are timed together as KSPSetUp. For this example, is KSPSetUp time and KSPSolve time mutually exclusive? > > No, KSPSetUp() will be contained in KSPSolve() if it is called automatically. > > In your response you said that > > ?PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used.? > > I don?t see a explicit call to PCSetUp() or PCApply() in ex45; so for this particular preconditioner (bjacobi) how can I tell how they are timed? > > They are all inside KSPSolve(). If you have a preconditioned linear solve, the oreconditioning happens during the iteration. So an iteration would mostly > consist of MatMult + PCApply, with some vector work. > > I am hoping to time KSP solving and preconditioning mutually exclusively. > > I am not sure that concept makes sense here. See above. > > Thanks, > > Matt > > > Kind regards, > Karthik. > > > From: Barry Smith > > Date: Tuesday, 28 September 2021 at 19:19 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > > Cc: "petsc-users at mcs.anl.gov " > > Subject: Re: [petsc-users] %T (percent time in this phase) > > > > > On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Thanks for Barry for your response. > > I was just benchmarking the problem with various preconditioner on cpu and gpu. I understand, it is not possible to get mutually exclusive timing. > However, can you tell if KSPSolve time includes both PCSetup and PCApply? And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and PCApply. > > If you do not call KSPSetUp() separately from KSPSolve() then its time is included with KSPSolve(). > > PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on how much of the preconditioner construction can take place early, so depends exactly on the preconditioner used. > > So yes the answer is not totally satisfying. The one thing I would recommend is to not call KSPSetUp() directly and then KSPSolve() will always include the total time of the solve plus all setup time. PCApply will contain all the time to apply the preconditioner but may also include some setup time. > > Barry > > > Best, > Karthik. > > > > > From: Barry Smith > > Date: Tuesday, 28 September 2021 at 16:56 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > > Cc: "petsc-users at mcs.anl.gov " > > Subject: Re: [petsc-users] %T (percent time in this phase) > > > > > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > wrote: > > Hello, > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson problem. I noticed from the output from using the flag -log_summary that for various events their respective %T (percent time in this phase) do not add up to 100 but rather exceeds 100. So, I gather there is some overlap among these events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive %T or Time for these individual events? I have attached the log_summary output file from my run for your reference. > > > For nested solvers it is tricky to get the times to be mutually exclusive because some parts of the building of the preconditioner is for some preconditioners delayed until the solve has started. > > It looks like you are using the default preconditioner options which for this example are taking more or less no time since so many iterations are needed. It is best to use -pc_type mg to use geometric multigrid on this problem. > > Barry > > > > > Thanks! > Karthik. > > This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 30 09:52:54 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 30 Sep 2021 10:52:54 -0400 Subject: [petsc-users] PETSc 3.16 release Message-ID: <8C74EED7-7C05-4E27-A2BD-B0B76F71B86B@petsc.dev> We are pleased to announce the release of PETSc version 3.16 at https://petsc.org/release/download/ A list of the major changes and updates can be found at https://petsc.org/release/docs/changes/316 The final update to petsc-3.15 i.e petsc-3.15.5 is also available We recommend upgrading to PETSc 3.16 soon. As always, please report problems to petsc-maint at mcs.anl.gov and ask questions at petsc-users at mcs.anl.gov This release includes contributions from Albert Cowie Alexei Colin Barry Smith Blaise Bourdin Carsten Uphoff Connor Ward Daniel Finn Daniel Shapero Erik Schnetter Fande Kong Hong Zhang Jacob Faibussowitsch Jed Brown Jeremy Tillay Joe Wallwork Joseph Pusztay Jose Roman Junchao Zhang Koki Sagiyama Kyle Gerard Felker Lawrence Mitchell Leila Ghaffari Lisandro Dalcin Mark Adams Martin Diehl Matthew Knepley Matt McGurn Moritz Huck Mr. Hong Zhang nathawani olivecha Pablo Brubeck Patrick Sanan pbrubeck Pierre Jolivet Richard Tran Mills Rylee Sundermann Sajid Ali Sam Reynolds Satish Balay Scott Kruger Stefano Zampini Toby Isaac Todd Munson Vaclav Hapla Yang Zongze Zhao Gang and bug reports/patches/proposed improvements received from Adrian Croucher Alexandre Halbach Bret K. Stanford Chonglin Zhang (@zhangchonglin) cleaf "Constantinescu, Emil M." Damian Marek Daniel Stone Danyang Su David Salac dazza simplythebest Drew Parsons (@RizzerAtGitLab) edgar at openmail.cc Emily Jakobs Emmanuel Ayala Eric Chamberland (@eric.chamberland) Fande Kong Getnet Betrie Haplav hg Iman Datta "Isaac, Tobin G" Jacob Faibussowitsch Jeremy Kozdon Jin Chen Junchao Zhang Lars Corbijn Lawrence Mitchell Lisandro Dalcin "Lundvick, Nick" Mark Adams Martin Diehl Matthew Otten Milan Pelletier Mr. Hong Zhang Nan Ding Pierre Jolivet Pieter Ghysels Qi Tang @qiuchangkai Rezgar Shakeri Rory Johnston Sam Fagbemi Saransh Saxena Sergio Bengoechea Stephen Jardin TAY wee-beng Victor Eijkhout Xiaoye S. Li Yang Liu As always, thanks for your support, Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From aduarteg at utexas.edu Thu Sep 30 15:14:17 2021 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Thu, 30 Sep 2021 15:14:17 -0500 Subject: [petsc-users] PC shell destroy Message-ID: Good afternoon PETSC team, I am currently developing an application for PETSC in which I use my own preconditioner with a PCSHELL. I have successfully set all the functions and the performance of the preconditioner is good. I am using this PCSHELL within a TS object, and it is imperative that the objects in the PCSHELL context are freed every time since the memory requirements of those are large. I have set up the Preconditioner before the TS starts with the following block of code: ---------------------------------------------------------------------------------------------------- ierr = PCSetType(pc,PCSHELL);CHKERRQ(ierr); ierr = ShellPCCreate(&shell);CHKERRQ(ierr); ierr = PCShellSetApply(pc,MatrixFreePreconditioner);CHKERRQ(ierr); ierr = PCShellSetContext(pc,shell);CHKERRQ(ierr); ierr = PCShellSetDestroy(pc,ShellPCDestroy);CHKERRQ(ierr); ierr = PCShellSetName(pc,"MyPreconditioner");CHKERRQ(ierr); ierr = ShellPCSetUp(pc,da,0.0,dt,u,user);CHKERRQ(ierr); ierr = TSSetPreStep(ts,PreStep);CHKERRQ(ierr); ------------------------------------------------------------------------------------------------ The shell context is then updated by using the following code within the TSPreStep function: --------------------------------------------------------------------------- ierr = TSGetSNES(ts,&snes);CHKERRQ(ierr); ierr = SNESGetKSP(snes,&ksp);CHKERRQ(ierr); ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); // Get necessary objects from TS context TSGetTime(ts,&time); TSGetApplicationContext(ts,&user); TSGetSolution(ts,&X); TSGetTimeStep(ts,&dt); TSGetStepNumber(ts, &stepi); TSGetDM(ts,&da); tdt = time+dt; // Update preconditioner context with current values ierr = ShellPCSetUp(pc,da,tdt,dt,X,user);CHKERRQ(ierr); --------------------------------------------------------------------------- I have set up the necessary code in the function ShellPCDestroy to free the objects within this context, however I am unsure when/if this function is called automatically. Do I have to free the context myself after every step? How would I call the function myself? I am running out of memory after a few steps, and I think this shell context is the culprit. In addition to that, is it possible to get what is called the "ashift" dF/dU + a*dF/dU_t in this function from the TS object? https://petsc.org/release/docs/manualpages/TS/TSSetIJacobian.html I need it as an input for my preconditioner (currrently hardcoded for TSBEULER where ashift is always 1/dt). Thank you, -Alfredo -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Sep 30 16:32:57 2021 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 30 Sep 2021 17:32:57 -0400 Subject: [petsc-users] PC shell destroy In-Reply-To: References: Message-ID: You can use PETSc functions to allocate and free memory and then run with -malloc_debug and you will get a printout of memory used and any unfreed memory. Mark On Thu, Sep 30, 2021 at 4:14 PM Alfredo J Duarte Gomez wrote: > Good afternoon PETSC team, > > I am currently developing an application for PETSC in which I use my own > preconditioner with a PCSHELL. > > I have successfully set all the functions and the performance of the > preconditioner is good. I am using this PCSHELL within a TS object, and it > is imperative that the objects in the PCSHELL context are freed every time > since the memory requirements of those are large. > > I have set up the Preconditioner before the TS starts with the following > block of code: > > ---------------------------------------------------------------------------------------------------- > > ierr = PCSetType(pc,PCSHELL);CHKERRQ(ierr); > ierr = ShellPCCreate(&shell);CHKERRQ(ierr); > ierr = PCShellSetApply(pc,MatrixFreePreconditioner);CHKERRQ(ierr); > ierr = PCShellSetContext(pc,shell);CHKERRQ(ierr); > ierr = PCShellSetDestroy(pc,ShellPCDestroy);CHKERRQ(ierr); > ierr = PCShellSetName(pc,"MyPreconditioner");CHKERRQ(ierr); > ierr = ShellPCSetUp(pc,da,0.0,dt,u,user);CHKERRQ(ierr); > ierr = TSSetPreStep(ts,PreStep);CHKERRQ(ierr); > > > ------------------------------------------------------------------------------------------------ > > The shell context is then updated by using the following code within the > TSPreStep function: > > --------------------------------------------------------------------------- > > ierr = TSGetSNES(ts,&snes);CHKERRQ(ierr); > ierr = SNESGetKSP(snes,&ksp);CHKERRQ(ierr); > ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); > > // Get necessary objects from TS context > TSGetTime(ts,&time); > TSGetApplicationContext(ts,&user); > TSGetSolution(ts,&X); > TSGetTimeStep(ts,&dt); > TSGetStepNumber(ts, &stepi); > TSGetDM(ts,&da); > > tdt = time+dt; > // Update preconditioner context with current values > ierr = ShellPCSetUp(pc,da,tdt,dt,X,user);CHKERRQ(ierr); > --------------------------------------------------------------------------- > > I have set up the necessary code in the function ShellPCDestroy to free > the objects within this context, however I am unsure when/if this function > is called automatically. Do I have to free the context myself after every > step? How would I call the function myself? > > I am running out of memory after a few steps, and I think this shell > context is the culprit. > > In addition to that, is it possible to get what is called the "ashift" dF/dU > + a*dF/dU_t in this function from the TS object? > > https://petsc.org/release/docs/manualpages/TS/TSSetIJacobian.html > > I need it as an input for my preconditioner (currrently hardcoded for > TSBEULER where ashift is always 1/dt). > > Thank you, > > -Alfredo > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 30 17:08:56 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 30 Sep 2021 18:08:56 -0400 Subject: [petsc-users] PC shell destroy In-Reply-To: References: Message-ID: <3F7565A5-E6BA-4C62-92ED-C5665BFC4B09@petsc.dev> Alfredo, I think the best approach for you to use is to have your own MATSHELL and your own PCSHELL. You will use your MATSHELL as the second matrix argument to TSSetIJacobian(). It should record the current x and the current shift. Your PCSHELL will then, in PCSetUp(), get access to the current x and the current shift from your MATSHELL and build itself. In other words most of your > / Get necessary objects from TS context > TSGetTime(ts,&time); > TSGetApplicationContext(ts,&user); > TSGetSolution(ts,&X); > TSGetTimeStep(ts,&dt); > TSGetStepNumber(ts, &stepi); > TSGetDM(ts,&da); > > tdt = time+dt; > // Update preconditioner context with current values > ierr = ShellPCSetUp(pc,da,tdt,dt,X,user);CHKERRQ(ierr); code will disappear and you won't need to mess with the internals of the TS (getting current dt etc) at all. What you need is handed off to your TSSetIJacobian() function which will stick it into your MATSHELL. So nice and clean code. Regarding the PCDestroy() for your PCSHELL. It only gets called when the PC is finally destroyed which is when the TS is destroy. So if building your PC in PCSetUp() requires creating new objects you should destroy any previous ones when you create the new ones, hence "lost" objects won't persist in the code. Barry > On Sep 30, 2021, at 4:14 PM, Alfredo J Duarte Gomez wrote: > > Good afternoon PETSC team, > > I am currently developing an application for PETSC in which I use my own preconditioner with a PCSHELL. > > I have successfully set all the functions and the performance of the preconditioner is good. I am using this PCSHELL within a TS object, and it is imperative that the objects in the PCSHELL context are freed every time since the memory requirements of those are large. > > I have set up the Preconditioner before the TS starts with the following block of code: > ---------------------------------------------------------------------------------------------------- > > ierr = PCSetType(pc,PCSHELL);CHKERRQ(ierr); > ierr = ShellPCCreate(&shell);CHKERRQ(ierr); > ierr = PCShellSetApply(pc,MatrixFreePreconditioner);CHKERRQ(ierr); > ierr = PCShellSetContext(pc,shell);CHKERRQ(ierr); > ierr = PCShellSetDestroy(pc,ShellPCDestroy);CHKERRQ(ierr); > ierr = PCShellSetName(pc,"MyPreconditioner");CHKERRQ(ierr); > ierr = ShellPCSetUp(pc,da,0.0,dt,u,user);CHKERRQ(ierr); > ierr = TSSetPreStep(ts,PreStep);CHKERRQ(ierr); > > ------------------------------------------------------------------------------------------------ > > The shell context is then updated by using the following code within the TSPreStep function: > > --------------------------------------------------------------------------- > > ierr = TSGetSNES(ts,&snes);CHKERRQ(ierr); > ierr = SNESGetKSP(snes,&ksp);CHKERRQ(ierr); > ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); > > // Get necessary objects from TS context > TSGetTime(ts,&time); > TSGetApplicationContext(ts,&user); > TSGetSolution(ts,&X); > TSGetTimeStep(ts,&dt); > TSGetStepNumber(ts, &stepi); > TSGetDM(ts,&da); > > tdt = time+dt; > // Update preconditioner context with current values > ierr = ShellPCSetUp(pc,da,tdt,dt,X,user);CHKERRQ(ierr); > --------------------------------------------------------------------------- > > I have set up the necessary code in the function ShellPCDestroy to free the objects within this context, however I am unsure when/if this function is called automatically. Do I have to free the context myself after every step? How would I call the function myself? > > I am running out of memory after a few steps, and I think this shell context is the culprit. > > In addition to that, is it possible to get what is called the "ashift" dF/dU + a*dF/dU_t in this function from the TS object? > > https://petsc.org/release/docs/manualpages/TS/TSSetIJacobian.html > > I need it as an input for my preconditioner (currrently hardcoded for TSBEULER where ashift is always 1/dt). > > Thank you, > > -Alfredo > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail2amneet at gmail.com Thu Sep 30 17:16:38 2021 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Thu, 30 Sep 2021 15:16:38 -0700 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system In-Reply-To: References: Message-ID: >> If you want to solve systems accurately, you should non-dimensionalize the system prior to discretization. This would mean that your C and b have elements in the [1, D] range, where D is the dynamic range of your problem, say 1e4, rather than these huge numbers you have now. @Matt: We have done non-dimensionalization and the diagonal matrix ranges from 1 to 1e4 now. Still it takes 4-5 iterations to converge for the non-dimensional diagonal matrix. The convergence trend is looking much better now, though: Residual norms for temperature_ solve. 0 KSP preconditioned resid norm 4.724547545716e-04 true resid norm 2.529423250889e+00 ||r(i)||/||b|| 4.397759655853e-05 1 KSP preconditioned resid norm 6.504853596318e-06 true resid norm 2.197130494439e-02 ||r(i)||/||b|| 3.820021755431e-07 2 KSP preconditioned resid norm 7.733420341215e-08 true resid norm 3.539290481432e-04 ||r(i)||/||b|| 6.153556501117e-09 3 KSP preconditioned resid norm 6.419092250844e-10 true resid norm 5.220398494466e-06 ||r(i)||/||b|| 9.076400273607e-11 4 KSP preconditioned resid norm 5.095955157158e-12 true resid norm 2.484163999489e-08 ||r(i)||/||b|| 4.319070053474e-13 5 KSP preconditioned resid norm 6.828200916501e-14 true resid norm 2.499229854610e-10 ||r(i)||/||b|| 4.345264170970e-15 Linear temperature_ solve converged due to CONVERGED_RTOL iterations 5 Only when all the equations are scaled individually the convergence is achieved in a single iteration. In the above, all equations are scaled using the same non-dimensional parameter. Do you think this is reasonable or do you expect the diagonal system to converge in a single iteration irrespective of the range of diagonal entries? @Barry: > > > What is the result of -ksp_view on the solve? > KSP Object: (temperature_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement when needed happy breakdown tolerance 1e-30 maximum iterations=1000, nonzero initial guess tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (temperature_) 1 MPI processes type: shell IEPSemiImplicitHierarchyIntegrator::helmholtz_precond::Temperature linear system matrix = precond matrix: Mat Object: 1 MPI processes type: shell rows=1, cols=1 > > The way you describe your implementation it does not sound like > standard PETSc practice. > Yes, we do it differently in IBAMR. Succinctly, the main solver is a matrix-free one, whereas the preconditioner is a FAC multigrid solver with its bottom solver formed on the coarsest level of AMR grid using PETSc (matrix-based KSP). In the above -ksp_view temperature_ is the matrix-free KSP solver and IEPSemiImplicitHierarchyIntegrator::helmholtz_precond is the FAC preconditioner. > > With PETSc using a matrix-free operation mA and a matrix from which KSP > will build the preconditioner A one uses KSPSetOperator(ksp,mA,A); and > then just selects the preconditioner with -pc_type xxx For example to use > Jacobi preconditioning one uses -pc_type jacobi (note that this only uses > the diagonal of A, the rest of A is never used). > We run -pc_type jacobi on the bottom solver of the FAC preconditioner. > > If you wish to precondition mA by fully solving with the matrix A one can > use -ksp_monitor_true_residual -pc_type ksp -ksp_ksp_type yyy -ksp_pc_type > xxx -ksp_ksp_monitor_true_residual with, for example, yyy of richardson > and xxx of jacobi > Yes, this is what we do. > > Barry > > > > > To verify that I am indeed solving a diagonal system I printed the PETSc > matrix from the preconditioner and viewed it in Matlab. It indeed shows it > to be a diagonal system. Attached is the plot of the spy command on the > printed matrix. The matrix in binary form is also attached. > > My understanding is that because the C coefficient is varying in 4 orders > of magnitude, i.e., Max(C)/Min(C) ~ 10^4, the matrix is poorly scaled. When > I rescale my matrix by 1/C then the system converges in 1 iteration as > expected. Is my understanding correct, and that scaling 1/C should be done > even for a diagonal system? > > When D is non-zero, then scaling by 1/C seems to be very inconvenient as D > is stored as side-centered data for the matrix free solver. > > In the case that I do not scale my equations by 1/C, is there some solver > setting that improves the convergence rate? (With D as non-zero, I have > also tried gmres as the ksp solver in the matrix-based preconditioner to > get better performance, but it didn't matter much.) > > > Thanks, > Ramakrishnan Thirumalaisamy > San Diego State University. > > > > -- --Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Sep 30 17:34:50 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 30 Sep 2021 18:34:50 -0400 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system In-Reply-To: References: Message-ID: <00A92945-C009-4A92-B7E2-909B1783CCF4@petsc.dev> > On Sep 30, 2021, at 6:16 PM, Amneet Bhalla wrote: > > > >> If you want to solve systems accurately, you should non-dimensionalize the system prior to discretization. This would mean that > your C and b have elements in the [1, D] range, where D is the dynamic range of your problem, say 1e4, rather than these huge > numbers you have now. > > @Matt: We have done non-dimensionalization and the diagonal matrix ranges from 1 to 1e4 now. Still it takes 4-5 iterations to converge for the non-dimensional diagonal matrix. The convergence trend is looking much better now, though: > > Residual norms for temperature_ solve. > 0 KSP preconditioned resid norm 4.724547545716e-04 true resid norm 2.529423250889e+00 ||r(i)||/||b|| 4.397759655853e-05 > 1 KSP preconditioned resid norm 6.504853596318e-06 true resid norm 2.197130494439e-02 ||r(i)||/||b|| 3.820021755431e-07 > 2 KSP preconditioned resid norm 7.733420341215e-08 true resid norm 3.539290481432e-04 ||r(i)||/||b|| 6.153556501117e-09 > 3 KSP preconditioned resid norm 6.419092250844e-10 true resid norm 5.220398494466e-06 ||r(i)||/||b|| 9.076400273607e-11 > 4 KSP preconditioned resid norm 5.095955157158e-12 true resid norm 2.484163999489e-08 ||r(i)||/||b|| 4.319070053474e-13 > 5 KSP preconditioned resid norm 6.828200916501e-14 true resid norm 2.499229854610e-10 ||r(i)||/||b|| 4.345264170970e-15 > Linear temperature_ solve converged due to CONVERGED_RTOL iterations 5 > > > Only when all the equations are scaled individually the convergence is achieved in a single iteration. In the above, all equations are scaled using the same non-dimensional parameter. Do you think this is reasonable or do you expect the diagonal system to converge in a single iteration irrespective of the range of diagonal entries? For a diagonal system with this modest range of values Jacobi should converge in a single iteration. The output below is confusing, it is a system with 1 variable and should definitely converge in one iterations. I am concerned we may be talking apples and oranges here and your test may not be as simple as you think it is (with regard to the diagonal). > > @Barry: >> > > What is the result of -ksp_view on the solve? > > KSP Object: (temperature_) 1 MPI processes > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement when needed > happy breakdown tolerance 1e-30 > maximum iterations=1000, nonzero initial guess > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (temperature_) 1 MPI processes > type: shell > IEPSemiImplicitHierarchyIntegrator::helmholtz_precond::Temperature > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: shell > rows=1, cols=1 > > > The way you describe your implementation it does not sound like standard PETSc practice. > > Yes, we do it differently in IBAMR. Succinctly, the main solver is a matrix-free one, whereas the preconditioner is a FAC multigrid solver with its bottom solver formed on the coarsest level of AMR grid using PETSc (matrix-based KSP). > > In the above -ksp_view temperature_ is the matrix-free KSP solver and IEPSemiImplicitHierarchyIntegrator::helmholtz_precond is the FAC preconditioner. > > With PETSc using a matrix-free operation mA and a matrix from which KSP will build the preconditioner A one uses KSPSetOperator(ksp,mA,A); and then just selects the preconditioner with -pc_type xxx For example to use Jacobi preconditioning one uses -pc_type jacobi (note that this only uses the diagonal of A, the rest of A is never used). > > We run -pc_type jacobi on the bottom solver of the FAC preconditioner. > > If you wish to precondition mA by fully solving with the matrix A one can use -ksp_monitor_true_residual -pc_type ksp -ksp_ksp_type yyy -ksp_pc_type xxx -ksp_ksp_monitor_true_residual with, for example, yyy of richardson and xxx of jacobi > > Yes, this is what we do. > > Barry > > > > >> To verify that I am indeed solving a diagonal system I printed the PETSc matrix from the preconditioner and viewed it in Matlab. It indeed shows it to be a diagonal system. Attached is the plot of the spy command on the printed matrix. The matrix in binary form is also attached. >> >> My understanding is that because the C coefficient is varying in 4 orders of magnitude, i.e., Max(C)/Min(C) ~ 10^4, the matrix is poorly scaled. When I rescale my matrix by 1/C then the system converges in 1 iteration as expected. Is my understanding correct, and that scaling 1/C should be done even for a diagonal system? >> >> When D is non-zero, then scaling by 1/C seems to be very inconvenient as D is stored as side-centered data for the matrix free solver. >> >> In the case that I do not scale my equations by 1/C, is there some solver setting that improves the convergence rate? (With D as non-zero, I have also tried gmres as the ksp solver in the matrix-based preconditioner to get better performance, but it didn't matter much.) >> >> >> Thanks, >> Ramakrishnan Thirumalaisamy >> San Diego State University. >> > > > > -- > --Amneet > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail2amneet at gmail.com Thu Sep 30 17:58:02 2021 From: mail2amneet at gmail.com (Amneet Bhalla) Date: Thu, 30 Sep 2021 15:58:02 -0700 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system In-Reply-To: <00A92945-C009-4A92-B7E2-909B1783CCF4@petsc.dev> References: <00A92945-C009-4A92-B7E2-909B1783CCF4@petsc.dev> Message-ID: > > > For a diagonal system with this modest range of values Jacobi should > converge in a single iteration. > This is what I wanted to confirm (and my expectation also). There could be a bug in the way we are setting up the linear operators in the preconditioner and the matrix-free solver. We need to do some debugging. (with regard to the diagonal). We have printed the matrix and viewed it in Matlab. It is a diagonal matrix. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Sep 30 19:48:43 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 Sep 2021 20:48:43 -0400 Subject: [petsc-users] Convergence rate for spatially varying Helmholtz system In-Reply-To: References: <00A92945-C009-4A92-B7E2-909B1783CCF4@petsc.dev> Message-ID: On Thu, Sep 30, 2021 at 6:58 PM Amneet Bhalla wrote: > > >> >> For a diagonal system with this modest range of values Jacobi should >> converge in a single iteration. >> > > This is what I wanted to confirm (and my expectation also). There could be > a bug in the way we are setting up the linear operators in the > preconditioner and the matrix-free solver. We need to do some debugging. > > (with regard to the diagonal). > > We have printed the matrix and viewed it in Matlab. It is a diagonal > matrix. > Can you send us the matrix? This definitely should converge in 1 iterate now, so something I do not understand is going on. I will take any format you've got :) Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: